: 


AD-A051  753 


UNCLASSIFIED 


NORTH  CAROLINA  UNIV  AT  CHAPEL  HILL  DEPT  OF  STATISTICS  F/5  12/1 

SOME  STATISTICAL  PROCEDURES  BASED  ON  DISTANCES. <U> 

NOV  76  J j WALKER  DAA629-7A-C-0030 

MNS-1096  ARO— 11959. 19— H NL 


THE  CONSOLIDATED  UNIVERSITY 
OF  NORTH  CAROLINA 


SOfC  STATISTICAL  PROCEDURES  BASED  ON  DISTANCE 


Joseph  J.  Palker 


Mimeo  Series  #1096 


Novembe: 


DEPARTMENT  OF  STATISTICS 
Chapel  Hill,  North  Carolina 


Ipprowd  (or  public  relea*«; 
SUMbattaa  Unlimited 


JOSFPll  J.  WALKL-R.  Some  Statistical  Procedures  Based  on  Distances 
(Under  the  direction  of  I.M.  CHAKRAVARTI  and  N.L.  JOHNSON.) 


A criterion  is  proposed  for  classifying  multivariate  "observations" 
according  to  their  populations  of  origin  when  the  observable  data  are 
the  distances  between  pairs  of  "observations,"  with  these  distances 
themselves  subject  to  further  variation,  such  as  measurement  error. 

The  same  basic  problem  is  investigated  under  several  assumptions  on 
the  underlying  normal  distributions.  In  each  case,  the  criterion  is 
shown  to  be  a particular  quadratic  form  in  normal  variables.  In  the 
simplest  case  considered,  a computational  form  for  the  distribution  is 
given.  An  asymptotic  expansion  is  developed  which  provides  an  approx- 
imation to  the  distribution  in  other  cases.  The  accuracy  of  the 
approximation  is  investigated  numerically. 

The  related  problem  of  estimation  of  the  noncentrality  parameter 
of  a noncentral  chi-squared  random  variable  is  also  investigated.  An 
estimator  is  proposed  which  is  based  on  the  two-sample  Wilcoxon  stat- 
istic, using  independent  samples  from  the  central  and  noncentral  chi- 
squared  distributions.  The  estimator  has  the  property  that  it  is 
invariant  under  monotonic  transformations  of  the  observed  data.  Further 
properties  of  the  estimator  are  derived  and  its  asymptotic  relative 
efficiency  with  respect  to  the  maximum  likelihood  estimator  is  inves- 
tigated numerically. 


This  research  was  supported  in  part  by  the  Army  Research  Office 
(Contract/S^ftJ^»74-C-0030)  and  the  National  Science  Foundation  ^ 
(Grant  CP  42325).  u u 


1 BgnUBUTIOH  STATEMENT  A~ 

for  public  release; 

Unlimited 


c 


iEfiSEIME 


MAR  24  1978 

llblSlbU  U Lbl 
B 


ACKNOWLEDGEMENTS 


I wish  to  express  my  greatest  thanks  to  Professors  N.L.  Johnson 
and  I.M.  Chakravarti,  who  served  as  advisors  in  this  research.  Their 
guidance  and  encouragement  during  the  course  of  this  investigation 
were  invaluable. 

My  sincere  thanks  also  go  to  the  other  members  of  the  examining 
committee.  Professors  G.D.  Simons,  R.J.  Carroll,  and  C.M.  Suchindran. 

Their  agreement  to  read  the  manuscript  form  of  this  dissertation  was 
of  great  help,  as  were  their  comments  on  it,  particularly  those  of 
Professor  Simons. 

I also  wish  to  express  my  gratitude  to  all  of  the  other  members  of 
the  faculty  of  the  Department  of  Statistics  who  have  contributed  to 
my  education.  In  particular,  I thank  Professor  E.J.  Wegman,  who  was 
my  advisor  for  some  of  my  graduate  years,  and  Professor  M.R.  Leadbetter, 
without  whose  help  in  obtaining  a master's  degree  prior  to  being 
drafted,  I probably  would  have  been  sent  to  Vietnam. 

I am  indebted  to  the  U.S.  Army  Research  Office  (contract  P/fA&M'* 
74-C-0030)  and  to  the  National  Science  foundation  (grant  CP42325)  for  their 
partial  support  of  the  research  contained  in  this  dissertation.  I am 
also  grateful  for  the  financial  help  received  through  the  Veterans 
Administration. 

For  her  excellent  typing  of  the  manuscript,  I thank  Ms.  June  Maxwell. 

Finally,  I must  express  my  deep  appreciation  of  the  contributions  of 


my  parents.  Their  sacrifices  made  my  undergraduate  education  possible, 
and  their  continued  moral  support  and  occasional  emergency  financial 
help  sustained  me  throughout  this  endeavor. 


TABLE  OF  CONTENTS 


I 

: 


1.  Introduction  

2.  Basic  Classification 

2.1  Introduction  

2.2  A criterion  for  the  one  population  problem- 

2.3  Asymptotic  expansion  for  distribution  of 

2.4  Accuracy  of  asymptotic  expansion 

3.  Classification  Under  More  Complicated  Models 

3.1  Noncentral  distribution  of  Sg 

3.2  Classification  of  several  new  objects 

3.3  Discrimination  between  two  populations 

3.4  Extensions  and  unsolved  problems 

4.  Noncentrality  Estimation 

4.1  Introduction 

4.2  Maximum  likelihood  estimation 

4.3  A randomized  estimation  procedure 

4.4  Asymptotic  relative  efficiency  of  X and  X*- 

4.5  Extensions  and  generalizations 

Bibliography 

Appendix 


1 

6 

7 

24 

35 

47 

53 

60 

65 


67 

68 
71 
77 
87 

89 

91 


n n 


1 . INTRODUCTION 

In  most  statistical  problems  which  are  analyzed  through  multi- 
variate methods,  it  is  assumed  that  we  have  several  characteristics 
of  interest  and  measurements  of  each  of  those  characteristics  for  each 
of  several  individuals.  Thus  if  there  are  p characteristics  and  n 
individuals,  the  data  would  consist  of  n p-dimensional  vectors 
x ! = (xu>. . . ,x.p),  i = l,...,n.  Inferences  concerning  the  distribution 
of  the  random  vectors  can  then  be  made  using  the  given  data. 

The  primary  aim  of  this  investigation  is  to  explore  the  making 
of  inferences  when  the  observable  data  are  not  the  vectors  described 
above,  but  rather  some  measurements  of  how  "far  apart"  pairs  of  indivi- 
duals are.  Thus  if  we  think  of  the  n vectors  as  representing  the  posi- 
tions of  n points  in  p-dimensional  space,  each  point  representing  an 
individual  or  object,  then  the  observable  data  might  be  the  euclidean 
(or  some  other)  distance  between  pairs  of  points  or  individuals.  To 
complicate  the  problem  further,  there  could  be  additional  variation 
arising  out  of  errors  of  measurement.  If  our  real  interest  is  in  making 
inferences  concerning  the  underlying  distribution  of  the  vectors,  then 
these  additional  measurement  errors  could  be  thought  of  as  distortions 
of  the  "true"  distances,  those  distances  between  the  points  with  posi- 
tions given  by  the  vectors. 


I 


In  some  cases  the  underlying  distribution  may  be  masked  even  more. 
For  example,  the  observed  data  may  consist  of  some  subjective  judgments 


or  perceptions  as  to  the  degree  of  similarity  (or  dissimilarity)  of  the 
various  individuals  or  objects.  In  such  cases,  we  can  still  assume  that 
there  is  some  underlying  distribution  of  characteristics  of  the  indivi- 
duals and  that  the  perceptions  of  similarity  are  related  in  some  way 
to  the  distances  between  individuals  in  the  characteristic  space. 

Shepard  [25]  gives  a summary  of  data  of  this  type  and  various  other 
related  types  along  with  a discussion  of  several  methods  of  analysis. 

Before  discussing  the  specific  topics  of  this  investigation,  we 
shall  briefly  mention  several  examples  involving  data  of  this  sort. 

In  these  cases,  methods  of  analysis  have  been  developed  for  making 
inferences  based  on  distances  or  distance- like  data. 

Paired  comparison  analysis  is  a technique  for  obtaining  an  ordering 
of  a set  of  objects  with  respect  to  some  property,  which  is  usually 
measured  only  in  a subjective  way  (e.g.  taste).  The  procedure  is  dis- 
cussed extensively  by  David  [5].  Typically,  the  objects  are  presented 

| 

I in  pairs  to  judges  who  indicate  which  of  each  pair  rates  higher.  The 

total  number  of  judges  rating  object  i higher  than  object  j,  less  the 
number  rating  j higher  than  i (in  absolute  value),  could  be  interpreted 
as  a measure  of  dissimilarity  between  the  two  objects:  the  closer  to 
zero  the  dissimilarity  is,  the  less  sure  the  judges  are  about  the  differ- 
ence (i.e.,  the  "closer  together"  the  objects  are).  Here  the  "measure- 
ment error"  does  not  enter  directly  into  the  final  measure  of  dissimilar- 
ity, but  rather  is  a factor  in  the  judges'  decisions  as  to  which  objects 
rate  higher. 

Cluster  analysis  embraces  a wide  variety  of  techniques  for  dividing 
a collection  of  objects  into  groups,  in  such  a way  that  those  in  a given 


group  are  more  similar  to  each  other  according  to  some  criterion  than 
they  are  to  those  in  the  other  groups.  There  are  a number  of  reviews 
of  the  various  techniques  in  the  literature  (e.g.  Cormack  [3],  Hartigan 
[10]).  Although  not  all  clustering  methods  use  distance  or  similarity 
data,  a great  many  do.  The  similarities  are  sometimes  subjective  meas- 
ures or  judgments  as  in  paired  comparison  methods.  In  other  cases,  they 
are  direct  measurements  or  estimates  of  the  distances  between  pairs  of 
objects.  In  fact,  even  when  the  vectors  of  observed  characteristics 
are  available,  those  data  are  often  converted  to  a similarity  matrix, 
for  example  by  calculating  the  correlation  coefficients  among  the 
characteristics.  Some  of  the  results  obtained  in  this  investigation, 
while  not  applicable  directly  to  the  clustering  problem,  are  related  to 
it,  and  might  be  adapted  to  it  as  criteria  for  deciding  when  objects 
belong  to  the  same  cluster. 

A problem  which  is  related  to  clustering,  but  is  considerably  simp- 
ler to  analyze  statistically  is  that  of  classification  or  discrimination, 
specifically  the  classification  of  one  or  more  objects  as  belonging  to 
one  of  several  known  groups  or  populations.  An  even  more  basic  problem 
is  that  of  making  a decision  as  to  whether  one  or  more  objects  belongs 
to  or  does  not  belong  to  a given  group  or  to  one  of  several  groups. 

When  the  underlying  distributions  of  the  populations  are  normal,  such 
problems  have  been  extensively  investigated  (e.g.  Anderson  [2],  chapter 
6,  Kendall  and  Stuart  [15],  chapter  44).  Some  work  has  also  been  done 
with  nonparametric  classification  rules  (e.g.  Das  Gupta  [4]). 

All  of  the  classification  techniques  mentioned  in  the  above  refer- 
ences, however,  assume  that  the  various  characteristics  can  themselves 


4 


be  measured.  The  author  is  unaware  of  any  similar  results  using  only 
distance  or  dissimilarity  data.  Chapters  two  and  three  of  this  disser- 
tation address  those  questions  when  the  underlying  population  distribu- 
tions are  normal  and  the  observed  distances  between  objects  contain 
additional  measurement  error.  In  Chapter  two,  a rule  for  deciding 
whether  an  object  is  from  a certain  population  or  not  is  proposed,  and 
its  distribution  in  each  case  is  investigated.  Chapter  three  contains 
extensions  of  this  basic  decision  problem  to  more  complicated  situations: 
specifically,  the  inclusion  of  several  objects  to  be  classified  and  the 
problem  of  assigning  an  object  to  one  of  two  known  populations. 

It  will  be  seen  that  the  results  obtained  here  involve  the  distri- 
butions of  various  quadratic  forms  in  normal  variables.  While  most  of 
the  results  used  here  are  taken  from  Johnson  ana  Kotz  [11],  chapter  29, 
it  should  be  pointed  out  that  they  were  originally  derived  by  others. 

For  example,  we  have  used  a representation  due  to  Gurland  [8] , [9] . Other 
early  work  in  the  area  was  done,  for  example,  by  Robbins  [22].  A repre- 
sentation due  to  Press  [2o]»  mentioned  in  Section  3.3,  may  allow  gener- 
alization of  some  of  the  results  obtained  here. 

Chapter  four  is  not  connected  directly  with  the  other  two  main 
chapters,  but  it  is  indirectly  related  to  them.  Under  the  assumption 
of  normality  for  the  distribution  of  the  measurement  error,  the  resulting 
distances  are  related  to  noncentral  chi-squared  random  variables  (with 
the  noncentrality  parameter  indicating  the  true  distance).  A related 
problem  is  the  estimation  of  the  noncentrality  parameter.  If  we  have 
estimates  of  the  distance  between  the  means  of  two  (possibly)  different 
populations,  we  can  use  them  to  make  a judgment  as  to  whether  the  true 


5 


distance  between  population  means  is  positive,  i.e.  whether  the  popula- 
tion are  different.  Maximum  likelihood  estimation  of  the  noncentrality 
parameter  leads  to  a rather  complicated  equation  which  often  does  not 
have  a simple  solution  (see,  e.g.  Johnson  and  Kotz  [ 1 1 ] , p.  136).  In 
Chapter  four,  we  propose  a simple  estimator  based  on  the  two-sample 
Wilcoxon  statistic  and  investigate  some  of  its  properties. 

Finally,  mention  should  be  made  here  of  the  technique  called  multi- 
dimensional scaling,  which  was  introduced  by  Shepard  [23], [24]  and 
Kruskal  [16],  [17].  The  basic  goal  of  the  procedure  is  to  obtain  a 
representation  of  a set  of  objects  as  a relatively  low  dimensional  con- 
figuration of  points,  such  that  the  distances  between  pairs  of  points 
closely  correspond  to  the  observed  distances  or  dissimilarities  between 
the  respective  objects.  The  methods  used  are  iterative,  successively 
adjusting  the  positions  of  the  points  until  the  rank  ordering  of  the 
distances  is  as  similar  as  possible  to  that  of  the  dissimilarities, 
according  to  some  criterion.  Much  of  the  motivation  for  the  research 
presented  here  came  as  a result  of  attempts  to  find  a rigorous  statis- 
tical analysis  for  the  scaling  problem.  That  goal  was  not  achieved  and 
to  the  author's  knowledge,  has  not  been  achieved  by  others  either. 
Perhaps  some  of  the  results  given  here  will  find  application  at  a later 
date  to  more  complicated  scaling  problems. 


2.  BASIC  CLASSIFICATION 


I 


2. 1 Introduction 

As  indicated  in  Chapter  one,  the  classification  of  objects  according 
to  the  populations  from  which  they  originated  has  been  studied  quite 
extensively.  If  we  assume  that  each  observation  consists  of  the  measure- 
ments of  p characteristics  for  a given  object,  the  assignment  to  a 
population  can  be  made  based  on  these  measurements  (see,  e.g.  Anderson 
[2],  chapter  6).  The  classification  criteria  which  result  often  can  be 
interpreted  in  terms  of  the  "distances"  of  the  observations  from  the 
populations  in  question.  For  example,  consider  the  problem  of  classi- 
fying a single  p-variate  observation  as  being  from  one  of  two  known 
multivariate  normal  populations  (see  Anderson  [2],  section  6.4).  If  the 
observed  measurements  are  X'  = (Xj,...,X_)  and  the  population  distri- 
butions, are  p-variate  normal  distributions,  denoted  N (j/^  ,E) , i = 1 , 2 , 
where  y^  = (y^  , ....y^^)  and  lis  a pxp  positive  definite  symmetric 
matrix,  then  classification  based  on  the  discriminant  function 


X'E_1(p(1)  - y^2-1) 


can  be  shown  to  be  optimal  in  terms  of  expected  loss.  Addition  of  the 
constant 


y(2))'Z' 


1,  (1) 
(y 


y(2)) 


to  the  discriminant  function  shows  that  the  classification  can  be  made 
equivalently  on  the  basis  of  the  difference 

(X  - y(1))'Z_1(X  - y(1))  - (X  - y(2))'E_1(X  - y(2))  , 


a 


7 


that  is,  on  a comparison  of  the  Mahalanobis  distance  of  the  observation 

from  the  centers  (means)  of  the  two  populations. 

In  this  chapter,  and  in  the  following  chapter,  we  wish  to  take  this 
dependence  on  distances  a step  further.  We  will  still  assume  that  a p- 
dimensional  normal  distribution  exists  for  the  populations  under  consi- 
deration and  that  the  object  to  be  classified  has  p characteristics. 
However,  these  variates  cannot  be  observed  directly.  The  only  observa- 
tions which  can  be  made  are  the  distances  of  the  object  to  be  classified 
from  other  objects  which  are  known  to  be  from  the  given  populations.  We 

will  also  assume  that  an  additional  error  is  made  in  the  determination 

of  each  distance.  As  discussed  in  the  previous  chapter,  this  additional 
error  might  correspond  to  measurement  error  or  to  the  uncertainty  intro- 
duced in  the  similarity-dissimilarity  types  of  measures.  Specifically, 
in  this  chapter  we  will  investigate  the  question  of  whether  a given  object 
is  from  a single  given  population  or  not,  and  in  the  next  chapter  we  will 
investigate  several  extensions  to  more  complicated  situations.  More 
complete  descriptions  of  the  models  used  will  be  given  as  they  are  needed. 

2. 2 A criterion  for  the  one  population  problem. 

Probably  the  most  basic  problem  which  can  be  analyzed  under  the 
models  considered  here  is  whether  a single  new  individual  or  object  is 
from  a given  population  or  not.  Before  investigating  this  problem  under 
the  distance  model,  let  us  consider  it  from  the  classical  point  of  veiw. 
Suppose  we  have  a p-variate  observation  X.'  = (X^.-.-.Xp)  and  wish  to 
decide  whether  it  has  arisen  from  a normal  distribution  with  mean 
p'  = (p^,...,p  ) and  covariance  matrix  E.  Assuming  that  both  p and  £ 
are  known,  we  could  consider  the  statistic 


8 


- 


i 


s = ( X - U)'Z_1( A - U)  , 


that  is,  the  Mahalanobis  distance  of  the  observation  from  the  mean  of 
the  distribution.  If  X is  from  the  given  population,  then  s is  distri- 
buted as  a chi-squared  random  variable  with  p degrees  of  freedom,  and 
we  can  use  that  distribution  for  making  inferences  concerning  whether 
X is  from  the  population 

Returning  to  the  problem  based  only  on  distance  observations,  we 
can  derive  a comparable  result  based  on  the  distances  of  the  new  object 
from  others  known  to  be  from  the  population.  First,  however,  we  must 
specify  more  completely  the  distance  model  which  we  are  utilizing. 

(A)  We  assume  that  there  are  n individuals  known  to  be  from  a 

population  and  one  individual  which  may  or  may  not  be  from  the 
same  population.  Let  X.j >X.2»  • • • ’X^  and  X^  be  the  respective 
(unobservable)  values  of  the  p-dimensional  random  variable 
upon  which  the  classification  is  to  be  based,  where  for 
i = 0,1,..., n,  x!  = (X. X.  ).  We  will  assume  that  the 

—l  ll  ip' 

population  distribution  is  normal  with  mean  jj  and  covariance 

matrix  Thus  X^.X^  , . . . ,X^  are  independent  random  variables, 

with  X,,...,X  each  distributed  as  N (y,E)  and  X„  distributed 
— 1 -n  p — — -0 

as  N (y_,E).  We  assume  that  the  additional  error  and  result- 
p m)  — 

ing  distance  measurements  have  the  following  structure:  let 

Yqi »Xio>X02’— 20’ ‘ ‘ ‘ ’^On’^nO  be  indePendent»  identically  dis- 
tributed normal  random  variables  with  mean  0^  and  covariance 
matrix  A,  and  for  i = l,2,...,n,  let 


Oi 


(V^Oi 


- X.-Y.J'E 

—l  — lO  — 


X.-Y.J 
—l  — lO 


9 


Then  S~.,...,Sn  are  the  observed  distances  of  the  new 
Oi  On 

individual  from  the  n individuals  known  to  be  from  the 
population.  Notice  that  we  let  Y„  be  the  error  made  in 
determining  the  position  of  individual  i in  the  measure- 
ment of  the  distance  from  i to  j and  we  assume  all  such 
errors  to  be  independent.  However,  are  not 

independent  since  all  depend  on  the  value  of  the  random 
variable  Xn. 


Since  X^,...,)^  are  dispersed  about  their  population  mean,  the 
natural  analogue  to  the  distance  from  the  mean  in  the  simpler  case  would 
be  the  average  distance  of  the  individual  to  be  classified  from  those 
known  to  be  from  the  population,  i.e., 


-1 


n 

l s 

i=l 


Oi 


This  is  the  criterion  which  we  shall  investigate.  Let  us  first,  however, 
examine  the  relationship  between  and  the  criterion  we  would  use  if  we 
could  actually  observe  X^.X^, . . . ,X^,  in  order  to  judge  the  appropriate- 
ness of  Sy  We  will  use  the  following  lemma: 


Lemma  2.2.1.  If  X^ , X.  , . . . , X^  are  arbitrary  p-dimensional  vectors 
and  A is  a symmetric  matrix,  then 


n 


n _ _ n _ _ 

= n l (Xq-X)  'AO^-X)  + l (X.-X)  'A(X.-X)  , 
i=l  i=l 


where  X = n ^Y11  .X.  . 


10 


Proof:  Since  we  can  write  A = BB'  for  11  appropriately  chosen  anti 
trace (BC)  = trace (CB)  if  B and  C are  conformable,  we  have 


n n 


n 

= tr{A  l (Xo-X.H^-X.)'} 
i=l 

n 

= tr{A  l (Xq-X  ♦ X-X.)  (X^X  ♦ X-X.)'} 
i=l 

n _ _ n 

= tr{A[  l (Xq-X)  Q^-X) ' + l (X-X.HX-X^'J} 
i=l  i=l 

_ _ n 

= nfXQ-iy'ACXQ-X)  + l (^-X) 'AC^-X)  □ 

i=l 

Returning  to  the  question  of  whether  X^  is  from  the  same  popula- 
tion as  X^,...,^,  we  see  that,  if  the  X/s  were  observable,  we  would 
have  a simple  test  of  hypothesis  situation:  we  have  X^  which  is  dis- 
tributed NptjjQ.Z)  and  Xq’-'-’Xj,  which  are  N (]i,E) , and  we  wish  to  test 
the  hypothesis  HQ:  = y versus  the  alternative  Hj : yQ  ^ y.  This  is 

a standard  two-sample  problem  and  the  usual  methods  of  test  construction 
(e.g.  likelihood  ratio)  would  lead  to  rejection  of  Hq  if  Q(X)  > c 
where 

Qw  = • 

We  cannot  observe  the  X/s  themselves  or  Q(X)  either,  but  SQ,  which 
is  observable,  is  related  to  Q(X) . For  the  moment,  let  us  ignore  the 
errors  made  in  determining  the  positions  of  the  X’s  for  the  distance 
measurements.  (For  convenience  here,  we  shall  refer  to  these  errors, 
then’s  in  the  model,  as  contaminating  variables,  since  they  distort 
the  positions  of  the  X* s.)  Then  by  Lemma  2.2.1  we  would  have 


1 n 1 n 
“I  Vo  - 1 V 


so  ■ J,s Oi  ■ " iQSo-ii)'!  tio-ii) 
1=1  1=1 


= + n'1  I QLi-XJ'E'V.-X)  . 

i=l 

The  first  quantity  on  the  right-hand  side  of  the  above  equation  is  just 
Q(X^),  and  the  second  quantity  is  independent  of  and  of  X and  hence  of 
Q (X) ; it  is  non-negative  and  does  not  depend  on  the  hypothesis  being 
tested.  Thus  a rejection  region  consisting  of  large  values  of  S()  and 
one  consisting  of  large  values  of  Q(X)  will  be  approximately  equivalent 
in  the  sense  that  a given  set  of  X/s  will  tend  to  lead  to  rejection  for 
either  region  or  acceptance  for  either,  regardless  of  which  hypothesis 
is  true.  Thus  would  seem  to  be  a reasonable  test  criterion  to  use 
in  this  simplified  case. 

Reintroducing  the  contaminating  variables,  we  can  express  as 


S0  = n (*o-*i> 

1 = 1 

♦ 2-'1  j (X0-X p-r'cv,, i-X i0> 

1=1 

1=1 


By  Lemma  2.2.1  and  algebraic  manipulation  we  have 


s0  = + 

* ""‘.I  CX.i-X  - - Voi*tiO>  > 

1=1 

where  2 = 2n~  . As  before,  the  second  facto*’  on  the  right- 

hand  side  does  not  depend  on  the  hypothesis  being  tested.  The  first 
factor,  however,  differs  slightly  from  QfX) ; there  is  some  distortion  due 


12 


to  the  contaminating  variables.  But  Z is  Independent  of  X^  and  X,  has 
expected  value  0 and  covariance  matrix  8n  Thus  if  we  assume  that 

A is  small  relative  to  the  first  factor  differs  only  slightly  from 
Q(X),  and  thus  a rejection  region  based  on  large  values  of  would 
still  appear  to  be  a reasonable  one  to  use. 

Following  the  proofs  of  two  lemmas,  we  shall  investigate  the  null 
distribution  of  S^,  that  is  the  distribution  when  = p. 


Notation:  In  the  following  lemma  and  thereafter  where  needed, 

I will  denote  the  n*n  identity  matrix,  J will  denote  the  n*n  matrix 
— n -n 

consisting  entirely  of  l's,  0 will  denote  a matrix  consisting  entirely 
of  0's  and  0 will  denote  the  Kronecker  product  of  two  matrices:  if 
A = ( (a^  j ) ) is  p*p  and  jl  is  q*q,  then  A 0 13  is  the  pq*pq  matrix 


‘11* 

a12— 

•••  V 

'21— 

a22— 

...  a2pB 

i , B 

a _B 

. . . a B 

pi- 

P2— 

pp- 

Lemma  2.2.2.  Let  A and  B be  p*p  matrices  and  let  D = I 0 A + 

J 0 B.  Then 
— n — 

| D | = | A|n_1 |A+nB|  . 

(Note:  This  lemma  is  a variation  of  Exercise  1.3  in  Rao  [21j.) 
Proof:  For  k = 1,2, ...,p  and  j = l,2,...,n-l,  add  the  (k+jp)-th 


-A 


column  of  the  matrix  to  the  k-th  column,  giving 


13 


A+nB  B jl  B 

A+nB  A+B  B ...  B 

| LJ  | = A+nB  B A+B  ...  B 

A+nB  B B ...  A+B 


For  k = 1 , 2, . . . ,p  and  j = l,2,...,n-l,  subtract  the  k-th  row  from  the 
(k+jp)-th  row,  giving 

A+nB  B B B 

0 A 0 0 

0 0 A ...  0 

...  . 

0 0 0 A 

But  if  C and  IE  are  square  matrices,  then 


Applying  this  fact,  first  with  C = A+njl  and  then  successively  with 
C = A,  we  obtain  the  result.  □ 


Lemma  2.2.3.  Let  C be  a symmetric  pxp  matrix  having  characteristic 

roots  X^,...,Xp  and  matrix  of  associated  orthogonal  vectors  R^.  Let 

A=I  8C+J  01.  Denote  the  characteristic  roots  of  A by 
— -n  — -n  -^p  - 

a1,a0, . . . ,anp-  Then  for  j = l,2,...,p  and  k = 1,2 n-1. 


14 


- R 

R 

R 

R 

/2 

W 

/n(n-l) 

R 

R 

R 

R 

' /2 

ft 

/n(n-l) 

R 

0 

2R 

R 

(2.2.2) 

a = 

/T 

/n(n-l) 

R 

0 

0 

(n-l)R 

/n(n-l) 

Note:  Because 

of  the  multiplicity 

of  the  roots  {A^},  Q is  not  uniquely 

defined,  but  (2 

.2.2)  is  a 

convenient  choice.  We  also  note  that 

the 

form  of  Q will 

not  be  used 

until  the  next  chapter, 

, but  it  is  convenient 

to  give  it  here 

• 

Proof:  To  find 

the  characteristic 

roots  of  A,  we 

can  solve  the 

deter- 

minantal  equation  |A  - a^pl  = 0- 

But  A - al  _ = 
- -np 

I*  8 <£  - olp) 

+ 

J 8 I . 
-n  -p 

Thus 

by  Lemma  2. 

2.2,  with  A and  B of  Lemma  2.2.2  being 

re- 

placed  respectively  by  C - 

al  and 
-P 

ip’ 

|A  - 

al  1 = |C 
-np1  ■- 

. - «vn' 

■*|C  - 

• 

Thus  the  solutions  of  I A - al  I = 0 are  those  of  |C  - al  | = 0,  each 

— np 1 -p 

occurring  (n-1)  times  and  those  of  |C^  - (a-n)I^|  = 0,  each  once.  Since 
R'C  R = A,  where  A = diagCA^, . . . .A^),  it  follows  that  the  solutions  of 
|C  - (ot-n)2p|  = 0 are  n+Aj , . . . ,n+Ap,  and  (2.2.1)  is  proved. 

Let  R = (ilj »I2, . . . ,t^) . By  the  definition  of  characteristic  roots 
and  vectors,  for  i = l,2,...,p,  C i\  = A^r^.  For  i = l,2,...,p  and 
k = 1,2,..., n-1,  let  qA  and  q^p+i  be  np-variate  vectors  such  that 


15 


a!  = —5—  (r!  ,r! r!) 

1 ^ 1 _1  _1 


% 


1 


(r! ,r! , 

p+1  /ko^iy  -1  _1 


• * - ki|*  o, . • • ,0)  , 


where  -kr!  is  the  (k+l)-st  p-variate  component  of  We  must  show 

— 3-i  = (n+V*i  and  A £kp+i  = • But 


' I +C  I ...  I n 

r . //n 

A £i  = 

-P  - “P  -P 

I I +C  ...  I 

-p  -p  - -p 

—l 

^/✓n 

I 1 ...  I +c 

r.  //n 

L -p  -p  -p  - j 

_ — l 

/nr.  + C r.//n 
—l i 

r.//n  ' 

—1 

/i\ r . + C r . / /rT 

r./*/7 

—X 1 

= (n+A.) 

—l 

/nr.  + C r.//n 

U — 1 1 J 

r^//i 

and 


' I +C  I 

I 

■ r.//2  ' 

-Vi  = 

“P  ~ ~P 

I I +c  ... 

-P  “P  “ 

“P 

I 

-P 

—1 

-r^/ /2 

I I 

_ “P  “P 

I +c 

-p  - J 

0 

C r.//2  1 
1 

" ijjfi 

C r.//2 
1 

= A. 

l 

-r^v^ 

0 

0 

0 


A 


Similarly,  for  k = 2,...,n-l,  A = ^jH^p+i  • Clearly  the  n-1 

vectors  corresponding  to  the  root  A^  (having  multiplicity  n-1)  arc 
linearly  independent  and  hence  span  the  subspace  corresponding  to  A.. 

(Of  course  the  chosen  £’ s are  not  the  only  such  set  of  vectors.)  Since 
R is  an  orthogonal  matrix,  it  is  straightforward  to  show  that  Q is  also, 
and  (2.2.2)  is  proved.  [J 


Theorem  2.2.1.  Let  SQ1 ,Sq2> ■ • • ,SQn  be  as  defined  in  paragraph  (A) 
pages  8 and  9 with  = p.  Then  Sp  = n can  rePresentet^ 


(l+n'1Aj)Wlj  ♦ n'^W^}  , (2.2.3) 

where  {W„,  w2j^j-i  2 p are  mutua^1>'  independent  chi-squared  random 

variables,  W . having  one  degree  of  freedom  and  W_ . having  n-1  degrees 
1 J 

of  freedom,  and  A,,..., A are  the  characteristic  roots  of  I + 2Z  A. 

1 P -P  “ ~ 

Proof:  By  the  assumptions  on  Spj,...,Sp  , we  can  express  as 

s0  = n-1jiUi.E-1y0J  • 

-1, 

where  U„.  = Xn-X.  + Y...-Y.,.  and  is  distributed  N (0,2(I+A)) . Let  Z 
-Oj  -0  — j -Oj  -j0  P - — - 

be  the  symmetric  square  root  of  Z_  (i.e.  Z 2 is  symmetric  and 
E‘V"2  = Z-1),  and  let 


U'  - u kT* %^'H)  • 


Thus  Sp  = n 1U'U  and  U is  distributed  normally  with  mean  0 and  covariance 


matrix 


V,i  ...  V. 
—11  —In 


. V 
-nn 


17 


_L  _U 

where  V..  = E{E  2U-.U'.  I 2}.  Thus  V. . is  the  covariance  matrix  of 
— ij  — -Oi-Oj  - —li 


or 


vii  = 21  '2  (E+A)E'12  = 2 (1+0) 

where  0 = E 2A  E For  i ^ j,  since  EX-  = EX.  = y and  X-,X X 

— u i — u i r 

are  independent  and  cov(X^)  = E,  it  follows  that 

vM  * r f ir4  ■ ip  • 


V = I 8 (I  + 2ft)  + J 8 1 
— -n  -n 


But  if  Sq  - n U'U  where  U is  (0, V) , then  we  can  represent  as 


-1  "P  2 
S.  = n l o.ZT 

0 j = l J J 


(2.2.4) 


where  Zj,...,Z  are  independent  unit  normal  random  variables  and 
a,,..., a are  the  characteristic  roots  of  V (see,  e.g.,  Johnson  and 
Kotz  [11],  chapter  29,  section  5).  By  Lemma  2.2.3,  ai,...»anp  consist 
of  n+Aj , each  occurring  once,  and  A. , each  occurring  n-1  times,  where 
Aj,...,A  are  the  characteristic  roots  of  1^  + 2ft,  or  equivalently  of 
Lp  + 2E  ^A.  The  result  follows  immediately  on  substitution  in  (2.2.4).  □ 


Since  can  be  represented  in  the  form  given  in  (2.2.4),  it  is 
clearly  a quadratic  form  in  normal  variables.  Consequently,  we  would 
expect  that  evaluation  of  quantities  like  Pr(Sp  > s)  would  involve 
straightforward  application  of  one  of  the  well-known  methods  such  as 
the  expansions  described  in  chapter  29  of  Johnson  and  Kotz  [11]. 
Unfortunately,  that  is  not  the  case.  The  reason  for  this  is  that,  for 
reasonably  rapid  convergence  of  these  expansions,  the  coefficients  (the 


18 


a’s  in  (2.2.4))  should  be  about  the  same  size.  In  the  situation  we 
have,  it  would  be  reasonable  to  assume  that  the  measurement  error  made 
in  determining  the  distances  is  small  relative  to  the  overall 
variability  in  the  problem;  otherwise,  there  would  be  little  hope  of 
drawing  meaningful  conclusions.  Thus  the  characteristic  roots  of 
if1 A can  be  assumed  to  be  small,  certainly  no  larger  than  one,  which  in 
turn  implies  that  are  about  one  or  a little  larger.  Thus  p 

of  the  coefficients  in  (2.2.4)  are  close  to  one  and  the  other  (n-l)p 
are  close  to  zero.  Because  of  this,  the  expansions  mentioned  above 
converge  too  slowly  to  be  of  any  practical  use. 

However,  Pr(SQ  > s)  gives  us  the  probability  of  misclassifying  an 
individual  which  is  truly  from  the  population,  a quantity  which  we  wish 
to  be  able  to  compute.  Before  examining  the  problem  of  evaluation  of 
Pr(SQ  > s)  in  general,  let  us  consider  a special  case.  Specifically, 
suppose  that  ^ and  A are  related  by  A = 62!,  where  0 £ 6 < 1;  that  is 
the  variability  of  the  measurement  error  is  similar  to  the  population 
variability,  but  smaller  in  magnitude.  Since,  as  mentioned  above,  we 
may  assume  that  the  measurement  error  is  small  compared  to  the  pop- 
ulation variability,  restricting  6 < 1 would  seem  to  be  reason- 
able. The  additional  restriction  that  A be  a constant  multiple 
of  £ may  be  valid  in  some  cases,  and  it  leads  to  considerable  simplifi- 
cation of  the  distribution  in  question.  Under  these  conditions, 

E^A  = 61  and  we  have  A,=...=\  =1  + 26.  Let  us  denote  the  common  value 

by  A.  The  representation  (2.2.3)  then  becomes,  by  reproductivity  of 
chi-squared  random  variables, 


SQ  = (1  + n'1A)W1  + n_1AW2  , 


(2.2.5) 


19 


where  and  are  independent  chi-squared  random  variables  with  p 
and  (n-l)p  degrees  of  freedom  respectively.  We  now  wish  to  obtain  an 
expression  for  Pr(SQ  > s)  when  SQ  has  the  form  (2.2.5).  First  we 
prove  a lemma. 

Lemma  2.2.4.  Let  p > 2 be  an  even  integer,  v an  arbitrary  positive 
integer  and  a,b  be  arbitrary  positive  constants  such  that  a > b;  let  X 
and  Y be  distributed  independently  as  chi-squared  random  variables  with 
p and  v degrees  of  freedom  respectively,  and  let  Z = aX  + bY.  Then 


Pr(Z  > z)  = 1 - Fv(i)  ♦ (-A^expl- 


1 ? (-  *)}. 


11=0 


n l2a'  v+2£  ab 


(2.2.6) 


2 

where  F^(x)  is  the  cumulative  distribution  function  of  an  random 
variable,  q = Jjp-l,  (x)  = T(x+a)/r(x),  and  for  £ = 0,1 q, 

cl 


i Q-fc  k 

W - if  I fr 

k=0 


(2.2.7) 


Proof:  By  definition,  we  have 


Pr(Z  > z)  = Pr(aX+bY  > z)  = Pr(Y  > ^)  + Pr(Y  s p X > 

b-1z 

= 1 ' FA>  + j Pr(x  > 1rI)fvtt)dt  * (2.2.8) 

t-0 


where 

fv(t)  = C*st)Ssv’_1e~  ht/r(h  v) 

2 

is  the  probability  density  function  of  a x^  random  variable.  But  for 
p even  and  q = %p- 1 , 


20 


Pr(X  > x)  = e'x/2  l (%x)j/j!  . 

j=0 

Hence,  letting  I(z)  be  the  integral  in  (2.2.8),  we  have 

b ~*z  q 

I(z)  = j {exp (-  ^^-)}{  l /j!}fv(t)dt 

t=0  ^=0 

l \ ‘cjd-mI  (DlrffVUn 

i=f)  j r>  ^ 

J t=0 


-z/2a 


x teftt^'V^/r^Jdt 

■ e'z/2a  ? i [iio-mr’i^’V 


j=0  «.=o 

b-lz 

(h)  (3st),SV+*'-1exp{-  Jst(l-ba_1)}/r(Jiv)dt 


t=0 


-1, 


Making  the  change  of  variable  u = t(l  - ba  ) and  rearranging  terms,  we 
obtain 

I (z)  = e"z/2a  J (-ba_1/{  l [A!  (j-A) ! ]_1(z/2a) j_)l} 

Z=0  j=£ 


a ^v+Jl 


z(  (a-by(ab) ) 


& 


C**3  (>*u)^V*A"1e_lsu/r(%v)du  . (2.2.9) 


u=0 


a-b 


But  the  integral  in  (2.2.9)  is  siFv+2Z^~ab  ’ and  the  term  in  braces 
is  h^  (x)  as  defined  in  (2.2.7)  (with  an  index  shift).  Hence  the 
result  follows  immediately  on  substitution  and  cancellation  of  terms  in 
(2.2.9).  11 


Theorem  2.2.2.  If  p is  even,  and  S(J j > • • • »s0n  are  as  defined  in 

paragraph  (A)  on  pages  8-9,  with  ]j_  = u,  A = <51,  and  S.  = n'1yn  ,S„., 

-U  — — 0 Lj  = l Oj 


P(S  >s)  = 1 - F (nsA  1 ) + exp|-lss/(l+n  1 X) } ( 1+n  *X)  "iV 


* t l (-n"1X)a(4v)fh(q)(lis/(l+n"1X))F  (sn2/nX+A2))}, 


(2.2.10) 


where  q = %p-l,  v = p(n-l),  X = 1+26,  Fy(x)  is  the  cumulative  distribu- 
tion function  of  a random  variable,  and  h|q^ (x)  is  a polynomial  of 
degree  q in  x,  as  defined  in  (2.2.7). 

Proof:  We  have  already  shown  that,  under  the  conditions  of  this  theorem, 
SQ  can  be  represented  as  in  (2.2.5),  that  is 

SQ  = (l+n^AJWj  + n_1XW2  , 

where  W^  and  W0  are  independent  chi-squared  random  variables  with  p 
and  p(n-l)  degrees  of  freedom  respectively  and  X = 1+26.  Thus  the  proof 
follows  directly  from  Lemma  2.2.4  by  substitution  of  b = n_1X,  a = l+n_1X 
and  v = p(n-l).  Here  a-b  = 1,  allowing  simplification  of  the  expression 
(2.2.6)  to  the  form  given  here.  0 


While  the  expression  (2.2.10)  appears  to  be  rather  complicated,  the 
values  of  are  tabulated  or  can  be  easily  computed  or  approxi- 

mated, and,  for  dimensions  of  practical  interest,  the  sum  in  (2.2.10) 
has  very  few  terms.  Thus,  even  though  the  peculiar  fv,rm  of  the  linear 
combination  of  chi-squared  random  variables  with  which  we  are  confronted 
makes  the  usual  computational  methods  of  little  or  no  use,  that  form  can 
also  be  exploited,  as  in  Theorem  2.2.2,  to  give  a practical  means  of 
computing  the  probabilities  of  interest,  at  least  when  p is  even. 


22 


For  odd  values  of  p,  there  does  not  seem  to  be  a simple  closed 
form  for  the  probability.  It  is  true  that  if  X is  a chi-squared 
random  variable  with  p = 2q-l  degrees  of  freedom  (q  even.),  then 

ptx>x3 

\>-0  r(j.3/2)  > 

where  $(•)  is  the  standard  normal  cumulative  distribution  function, 
and  this  expression  does  allow  us  to  write  an  expression  expanding 
(2.2.8)  in  this  case.  However,  here  the  term  in  braces  above,  instead 
of  resulting  in  a linear  combination  of  chi-squared  probabilities, 
results  in  a more  general  combination  of  confluent  hypergeometric 
functions.  In  addition,  an  integral  involving  $(•)  results,  and  it 
does  not  appear  to  have  a closed  form.  It  could,  of  course,  be  evalu- 
ated numerically. 

Because  of  the  representation  of  as 

Sn  = £ (1  + n‘1A.)W1.  + J n"*A.W  . , 

0 j=i  y j=i  j 2j 

with  and  the  appropriate  chi-squared  random  variables,  and  the 
A's  fixed  constants,  if  n is  large,  the  first  of  the  components  above  is 
distributed  much  like  a Xp  random  variable  and  the  second  component  be- 
haves much  like  a constant.  In  fact  it  is  simple  to  show  that  to  be 
the  limiting  distribution  as  n + ®. 


Theorem  2.2.3.  If  Aj,...,A  and  are  as  given  in  Theorem  2.2.1, 

then  the  limiting  distribution  of  S.  as  n ■>  “>  is  U + V?  ,A.,  where  l) 

0 £*jal  j 

is  distributed  as  \2 . 

P 


27, 


Proof : The  characteristic  function  of  a y*  random  variable  is 

by  

(1  - 2it)  2 . Thus  the  characteristic  function  of  is 

<{>(t)  = n { [1  - 2it  (l+n~  *A . ) ] ~^[l  - 2itn_1A.]"i5(n'1)} 
j = l J J 

P 

= II  A.(t)B.(t)C.(t)  (2.2.11) 

j=l  J J J 

where  (t)  = (l-2it-2itn  *A^)  2 , Bj (t)  = (l-2itn  *A  ) 2°,  and 
Cj(t)  = (l-2itn  ^Aj)5.  But  as  n -+  °°,  A^(t)  (l-2it)  2, 

B^(t)  -*■  exp(itA^)  and  C^(t)  -*■  1.  Thus  as  n + 

<Kt)  -*■  ( 1 -2 i t ) 2Pexp(it^?_^Aj)  , 

which  is  the  characteristic  function  of  U + F.  ,A.,  and  the  result 
follows  by  the  uniqueness  of  characteristic  functions.  □ 

Of  course,  the  limiting  distribution  gives  us  an  approximation  to 
the  distribution  of  SQ  for  large  n.  Another  condition  which  leads  to 
a relatively  simple  approximation  is  if  is  small.  We  saw  earlier  that 
if  A = 0,  that  is,  when  there  is  no  distortion  from  measurement  (or 
perception)  error,  the  test  based  on  is  approximately  equivalent  to 
the  two  sample  likelihood  ratio  test  for  equality  of  means.  In  practi- 
cal situations.  A,  may  be  small  compared  to  and  that  limiting  distri- 
bution as  A 0 can  be  used  as  an  approximation.  We  already  have  t lie 

characteristic  function  of  S„  in  (2.2.11).  But  as  A 0,  A . -*■  1 for 

0 — - J 

all  j,  and  hence  as  A ->■  0 , 

<Ht)  -+  [l-2it(l+n'1)]_!sp[l-2itn'1]‘Jspfn'1^  . 

By  inversion  of  the  characteristic  function,  the  distribution  of 
tends,  as  A -+  0,to  that  of  (1+n  ^)U  + n V, where  U and  V arc  independent 


24 


and  distributed  as  and  x^n  ^jp  respectively.  This  distribution  has 
already  been  given:  it  is  the  special  case  in  Theorem  2.2.2  with 


2.3  Asymptotic  expansion  for  distribution  of  SQ  . 

While  the  exact  distribution  of  SQ  is  difficult  to  compute  in 
general,  the  limiting  distribution  as  n + “ is  quite  simple  and  is  easily 
computed.  Thus  it  is  of  interest  to  develop  an  asymptotic  expansion 
for  the  distribution,  both  to  give  an  indication  of  the  rate  of  con- 
vergence to  the  limit  distribution  and  to  provide  an  approximation 
to  the  distribution  which  is  easier  to  compute  than  the  exact  distri- 
bution. In  this  section  we  shall  develop  such  an  expansion.  First, 
however,  we  need  two  lemmas. 

Lemma  2.3.1.  Let  X be  a chi-squared  random  variable  with  p degrees 

of  freedom,  and  let  Y be  a non-negative  random  variable  (depending  on  n) , 

independent  of  X,  with  EY  = 0(1)  as  n -*■  00  and  with  r-th  central  moment, 

_ f r / 2 1 

Ur,  of  order  0(n  L J)  for  all  integers  r > 2,  [t]  denoting  the 

integer  part  of  t.  Further,  let  S = X+Y  and  y*  denote  a bound  for  EY. 
Then  for  s > y*,  r > 1, 


Pr (S>s)  = Pr(X>s  - EY)  + £ 

j=2 

J-1 


2r  c.y. 


j! 


+ o(n  ) , 


(2.3.1) 


where  c.  = — ; — - f (s-y)  and  f (x)  is  the  density  function  of  X. 

J dyJ_1  p ly=EY  P 

With  c.  arbitrary,  the  coefficients  { c . } . _n  , - satisfy  the  following 
u 3 3 ='-'  * * » ^ > • • • 

recurrence  relation: 


(2.3.2) 


Proof:  Throughout  the  proof,  Y and  its  distribution  and  moments  depend 
on  n,  but  that  dependence  is  suppressed  notationally  for  simplicity. 

Let  G(y)  be  the  distribution  function  of  Y,  p = EY,  and 


25 


hs(y)  = Pr (X>s-y)  = I fp(x)dx  . 


Pr (S>s)  = E[hs(Y) ] = j hs(y)dG(y)  . 

0 

We  wish  to  expand  hs(y)  about  y = p,  using  Taylor's  formula  and  inte- 
grate the  result.  However,  hs(y)  may  not  satisfy  the  necessary  condi- 
tions on  continuity  and  differentiability  over  the  entire  range  of 
integration.  Specifically,  the  j-th  derivative  with  respect  to  y, 
hp^(y),  has  a discontinuity  at  y = s if  j > !j(p-2).  To  avoid  these 
difficulties,  let  b be  a fixed  constant  such  that  p*  < b < s.  Since 
b - p*  > 0 (independent  of  n) , we  have  by  a Chebyshev-type  inequality, 


Pr (Y>b)  < Pr(  | Y-p  | > b-p*)  < Zr  . 

(b-p*) 


But  for  r > 1,  M2r+2  = °(n  • Thus  Pr(Y>b)  = o(n  r)  for  any  r > 1. 

On  the  interval  0 < y < b,  hs(y)  is  everywhere  differentiable 
an  arbitrary  number  of  times  for  any  p > 1.  Thus  on  that  interval,  we 
can  expand  hs(y)  using  Taylor's  formula,  obtaining  for  r > 1, 


h (y)  = h (p)  + [ (y-p)jh^(p)/j! 

j = l 

+ (y-P)2r+1hf2r+1) (?)/ (2r+l) 1 , 


(2.5.3) 


where  f,  is  between  y and  p and  therefore  also  0 s £ < b.  Since 


26 


D ““ 

Pr(S>s)  = j hs(y)dG(y)  + j hs(y)dG(y) 


and 


U w 

Mj  = | (y-y)jdG(y)  + j (y-M)-’dG(y) 


(2.3.4) 


(2.3.5) 


substitution  of  (2.3.3)  and  (2.3.5)  in  (2.3.4),  together  with  the  fact 


that  y^  = 0,  yields 


2r 


Pr (S>s)  = h (y)  + l c - /j  ! + R 
j = 2 3 J 


(2.3.6) 


where 


R = [ 
r 1 


, , 2r+l  (2r+l) 


(2r+l) ! s 


(C)dG(y) J 


+ [J  hg(y)dG(y)  - 


hg(y)dG(y) ] 


2r  h(j)(y) 

[ l 5 

j = l 


J 


i ! 


(y-y)^dG(y)] 


= Tx  + T2  - T3  (say), 

and  C2,...,C2r  are  as  defined  in  the  theorem.  Because  y*  < b < s, 
|h^2r+1^ (y) | is  bounded  for  0 < y < b,  by  M say. 

Let  v be  the  r-th  absolute  central  moment  of  Y.  By  Liapounoff's 

2 

inequality  (see,  e.g.,  Kendall  and  Stuart  [13],  p.  62),  V2r+1  55 


V2rV2r+2‘  T^us  since>  f°r  t > 1,  v2r  = 0(n  r)  and  \>2r+2  = °^n  * 

we  have  , = 0(n  ^) . Therefore  it  follows  that  for  r "*  1, 

2r+l 


-r-1. 


iTjl  « |y-y |2r+1dG(y)  s v2r+1  = o(n'r)  . 


Also,  since  0 < hs(y)  = P(X>s-y)  < 1 for  all  y. 


< 2 


-rx 


dG(y)  = o(n  ) . 


For  T_ , let  T . = /°°(y-y)  JdG(y)  and  let 


uCy) 


( 1 if  | y-u | s 1 
0 if  | y-M | > 1 


Then  for  r 5 1, 


u(y) (y-y)- 


dG(y) 


l 

< 


dG(y)  = o(n'r) 


and  for  1 < j < 2r,  r £ 1, 

00 

j [l-u(y)]  (y-y)-^dG(y) 


s |y-y|2r+1dG(y)  £ v2r+1 


= o(n  ) 


Hence 


'V  S 


u(y) (y-y)^dG(y) 


oo 

j [l-u(y) ] (y-y)^dG(y) 


= o(n"  ) , 


2 H 


and 


2r 


|T3I  * ^ |h^j)(M)T3j|/j!  = o(n'r)  . 


Thus  |Rr|  < |Tj | + |T9|  + |T3|  = o(n  r)  and  (2.3.1)  is  proved 


To  prove  the  recurrence  relation,  let  z = Jj(s-y) . Then 


fp(S-y) 


_d  r dr-1 

dz  Uyr_1 


(s-y) 


It  follows  by  induction,  since 


dz 

dy 


-h,  that 


~ f (2z)  = C-h)T  4—  f(2z)  . 


dyr  P 

Now  fp(2z)  = i*zS*P”1e"Z/r(Hp) . and 


dz 


r p 


-4  [z'a’-V2]  - r![L(1a,-r-1)u)],'s>>-,-1e-z  , 

J _ r 


dz 


where 


l(y,U)  - l 

r j=0  r'J  j! 


(2.3.7) 


the  generalized  Laguerre  polynomial,  which,  while  properly  defined  only 
for  y>  -1,  can  be  defined  formally  as  in  (2.3.7)  for  all  y,  using  the 
convention  that  (^)  = 0 if  r is  a positive  integer  and  k > r or  if  k 
is  a negative  integer. 

Thus  we  have 

f_(s-y)  = (-*5)rrlL^!sP'r'1)(z)z'rf  (2z)  . 
dy  F v 

We  can  now  use  known  relations,  involving  Laguerre  polynomials  (see, 
c.g.,  Gradshteyn  and  Ryzhik  [7],  p.  1037)  to  obtain  the  desired  recur- 
rence relation.  In  particular,  we  use  the  three  relations 


Cj  is  defined  to  be  fp(s-EY),  and  we  can  define  Cq  arbitrarily  and  this 
relation  will  still  hold  for  j = 0.  Thus  we  have  obtained  the  desired 
relation  (2.3.2).  We  note  that  (2.3.2)  can  also  be  derived  directly, 
using  Leibnitz'  rule  for  the  derivative  of  a product.  □ 


Lemma  2.3.2.  Let  Y be  a random  variable  (depending  on  n)  with 

all  moments  and  cumulants  existing  and  such  that  the  r-th  cumulant, 

1 • r 

k^,  is  of  order  0(n  ) for  all  r 2 2.  Then  for  r 2 2,  the  r-th  cen- 

tral moment  of  Y,  Pr>is  of  order  0(n  r+[r/2l)>  [t]  denoting  the  integer 
part  of  t. 

Proof:  It  is  well-known  that,  subject  to  conditions  of  existence,  we 

can  write 


i = 7 I •••  ! a.  a,  ...a.  K.  k.  ...k. 

r Lh  j,  j£  h >2  h >1  J) 


where  £ 2,  = r>  and  the  a^’s  are  constants  not  depending  on  the 

distribution  of  Y (see  Kendall  and  Stuart  [13],  p.  70).  But 

l'h 

K.  = 0(n  ).  Hence 

yr  - o(nE‘-l(1_Jl)]  ■ • 

where  m is  the  maximum  number  of  cumulants  in  any  term  of  the  sum. 

Clearly  there  is  always  a term  involving  [r/2]  cumulants,  namely  with 

j l=j  2=  * * * r/2  = 2 if  r is  even  and  with  Ji  = 3*  ^2='  * -=;i  [r/2]  = 2 if 

r is  odd,  and  no  terms  involving  more.  Thus  m = [r/2].  □ 


Theorem  2.3.1.  Let  sqi »S02’ ‘ - ' ,S0n  be  as  defined  in  paragraph  (A) 


on  pages  8-9  with  y_  = p,  and  S 


o = and  so  = n_1Zj.is0j;  let  X1 


X be  the 
P 


characteristic  roots  of  I + 21  A and 

-p  - - 


3] 


V = n'1  l AW  , 

3 = 1 J J 

where  W^.-.-.W^  are  independent  chi-squared  random  variables  with 
“P  P 

n-1  degrees  of  freedom.  Then  for  s > ^_^A^  , r £ 1, 

Pr(?o>s)  = dQPr[Xp  > (s-EV)/3]  + l {dkPr[V2k  > (s_EV)/6] 


*1  dk-j 
j = l J 


r ak-j,2j-iy2j-l  + Vj,2jP2j  -* 

(23-13 ! (2j) ! 


+ o(n  r)  , 


k 


(2.3.9) 


where 


3 = l+n-1A  , A = p"1  l A.  , y.  « E[ CV-EV) /0]J  , 


j=i  2 


dk  " (-Dk,l 

J=k 


c0  = 1 


Ck  ' (2krIj50Bk-iCi 


k-1 


p A-A.  k 

Bk  ■ j (-r1) 

3=1 


and  (a,  . } , „ . . . „ . 

k , j k=0, 1 , . . . , j-0, 1 , 


(k  = 0,1, ...,r) 


(k  = 1,2,..., r) 

(k  = 0,1, ...,r) 

satisfy  the  following  relations: 


ak,0  = 1 


(k  = 0,1,...) 

aQ  j = Js(Ji(s-EV)/6)Jsp-1exp{-Js(s-EV)  /BJ/rOjp) 

- a J s-EV  \ ck  = o 1 ) 

♦1,1  ' ak,l\(P+2k)3j 


and  for  any  k 2 0 


.,*2  ■ -Km  * [i&]  v,} 


Proof : By  Theorem  2.2.1,  we  have  that 

S~  = V (l+n_1X .)W  . + y n_1A .W  . = U + V 

o >1  y ij  iti  J  1  2J 
2 2 

where  is  x^>  W2j  1 anc*  t^ie  ^ s are  independent.  Thus  U is 

a quadratic  form  in  normal  variables  and  for  n sufficiently  large,  we 

have  n > max  (A.-2X).  This  implies  1 + n *A  > 2(l+n  *A  ),  and  we 
l<j<p  J max 

can  express  the  distribution  function  G(u)  of  U using  a Laguerre  series 

expansion  as  described  in  Johnson  and  Kotz  [11],  chapter  29,  section  5: 

00 

i - g (u)  = l c e-jj  i[rOsp)/rOiP+j)] 
j=0  J 

00 

X | f , (2.3.10) 

U 

_1—  2 
where  B = 1+n  X,  f (x)  is  the  density  function  of  a x random  variable 


co  = 1 


c = (2k)  1 l B c 
i=n  K J J 


p A-A.  k 

Bk  - I hH-) 

K j=i  n 


(k  > 1) 


(k  > 0) 


and  Lj  ; (x)  is  the  Laguerre  polynomial  defined  in  (2.3.7).  By  substitu- 

foil 

tion  of  the  definition  of  L)  (x) , we  can  write  (2.3.10)  in  the  equiva- 
lent form 


1 - G(u)  = l c.6‘j  i (-l)k(j)Pr(X 

j =0  J k=0  K 


p+2k  > 8_1u)  ♦ Rr(u)  , (2.3.11) 


33 


where  Rf(u)  is  the  sum  of  the  remaining  terms  in  the  infinite  sum. 


Gurland  [9]  has  shown  that 


| R (u) | ^ M max 
l<j<p 


X-A.  |r'"1 
nB  | 


0(n”r_1) 


Alternatively,  we  may  observe  that  for  max  | A- A . | < 1,  | B,|  * pn  k . 

l<j<p  J 


Hence  for  k ^ 1, 


I'd*  TE«S>k‘«S>k 


*n' 


Thus  for  sufficiently  large  n. 


2 ^ 0- 1 

B u) 


|Rr(u)|  " .J+1|cj|B  J0(k)Pr(V2k  > 
5 ^*r+1  l (||)J  = o(n’r)  . 

n j=0  nB 


Upon  switching  the  order  of  summation  in  (2.3.11),  we  obtain 


1 - G (u)  = l dk  Pr(xi.,v  > 3_1u)  + o(n“r), 
k=0  p 

where  dk  = (-l)k^_k(^)cj  . Therefore 


Pr (SQ>s)  = Pr (U+V>s) 


= l dkPr(x^+2k  + B_1V  > B_1s)  + o(n**)  . (2.3.12) 


k=0 


But 


V = n_1y?  so  the  m-th  cumulant  of  V is 

^ = 1 J 2j 


and  hence,  since  Wv  is  a y , random  variable. 

2j  n - 1 * 


l 


= (s-EV) 

k+1,1  0(p+2k)  ak,l 

as  required.  The  exact  form  of  (2.3.9)  results  from  substitution  of 

(2.3.13)  in  (2.3.12),  omitting  terms  of  order  o(n  r) , determined  by  the 
-k+1  -1 

fact  that  d,  = o(n  ) , y.  = 0(n  ) and  for  j > 2,  y_.  . and  y . 

K 2 zj 

are  both  0(n  J ) . □ 


Since  (2.3.9)  is  quite  a complicated  expression,  it  would  be 
hoped  that  a good  approximation  to  the  distribution  of  SQ  be  obtained 
by  including  only  a few  terms.  In  the  next  section  we  shall  examine 
the  accuracy  of  (2.3.9)  as  an  approximation  and  find  that  often  this 
is  the  case.  Several  remarks  should  be  made  here  concerning  portions 
of  (2.3.9)  .hich  are  actually  not  quite  as  complicated  as  they  appear 
there: 

Remark  1.  With  the  particular  choice  of  0 = 1+n  *A,  we  have 


Bi=  0 and  cn  = 0,  simplifying  some  of  the  formulas  for  the  coefficients. 


35 


Remark  2. 
is  really 


Since  y^  = 0,  the  first  term  in  the  inner  sum  (j  = l) 


dk-lVlf2U2/2! 


but  it  was  , i . in  the  more  general  form  for  notational  convenience. 

Remark  3.  For  a given  value  of  k,  the  term  within  braces  in 
-k- 1 

(2.3.9)  is  o(n  );  i.e.,  terms  in  the  sum  over  k should  tend  to 
decrease  as  k increases,  suggesting  that  the  form  given  will  provide 
a reasonable  approximation. 

2 . 4 Accuracy  of  asymptotic  expansion. 

In  the  previous  section,  we  have  derived  an  asymptotic  expansion 
for  the  upper  tail  distribution  of  SQ.  However,  the  error  made  in  trun- 
cating after  a given  number  of  terms  has  not  yet  been  addressed. 

Further,  the  expansion  is  not  uniform  with  respect  to  s.  Thus  the 
question  of  the  accuracy  of  the  approximation  based  on  the  asymptotic 
expansion  naturally  arises.  Though,  based  on  the  results  here,  we  have 
the  exact  distribution  of  SQ  under  only  limited  conditions,  we  can  also 
investigate  the  general  behavior  of  the  asymptotic  approximation.  The 
following  discussion  is  illustrated  by  selected  values  of  the  upper 
tail  probability,  computed  from  Theorem  2.2.2  and  of  the  approximation 
provided  by  Theorem  2.3.1  with  successively  more  terms  (r  = 0,1, 2, 3). 
Although  Theorem  2.3.1  is  valid  only  for  r > 1,  the  case  r = 0,  i.e. 
Pr(S>s)  = P(X>s-EY),  is  included  here  for  comparison.  The  values  of  s 
have  been  selected  to  give  similar  sized  tail  probabilities.  These 
results  are  displayed  in  Tables  2.4.1(a)  - (d) . Table  2.4.2  gives 
selected  values  of  the  approximate  upper  tail  probability  (using  r = 3 
in  Theorem  2.3.1)  for  = col  and  other  slightly  different  forms  of  fi. 


36 


TABLE  2.4.1(a) 

n a) 

: Exact  and  Approximate  Values  of  Pr(Sg  > s)  for  p = 2 

Approximate  Probability 

Exact 

s Prob . r=0  r=l  r=2  r = 3 

10  . 25 

6 

.25850 

.23817 

.25640 

.25884 

. 25901 

8 

. 10856 

.09982 

. 10747 

. 10849 

. 10856 

10 

.04551 

.04184 

.04504 

.04547 

.04550 

12 

.01907 

.01754 

.01888 

.01906 

.01907 

.10 

5 

.29731 

.28143 

.29597 

.29747 

. 29754 

7 

.12184 

.11524 

. 12120 

. 12181 

.12184 

9 

.04989 

.04719 

.04963 

.04988 

.04989 

12 

.01307 

.01236 

.01300 

.01307 

.01307 

.05 

5 

.26895 

.25657 

.26791 

.26896 

.26901 

7 

. 10927 

. 10422 

. 10882 

. 10925 

.10927 

9 

.04439 

.04233 

.04420 

.04438 

.04439 

12 

.01149 

.01096 

.01144 

.01149 

.01149 

.01 

5 

.24795 

.23798 

.24715 

.24794 

. 24796 

7 

. 10007 

.09604 

.09974 

. 10006 

. 10007 

9 

.04038 

.03876 

.04025 

.04038 

.04038 

12 

.01035 

.00994 

.01032 

.01035 

.01035 

20  .25 

6 

.24252 

.23105 

.24174 

.24250 

.24254 

8 

.09567 

.09114 

.09536 

.09566 

.09567 

10 

.03774 

.03595 

.03761 

.03773 

.03774 

12 

.01489 

.01418 

.01484 

.01488 

.01489 

. 10 

5 

.28610 

.27720 

.28564 

.28610 

.28611 

7 

.11138 

. 10792 

.11120 

.11138 

.11138 

9 

.04336 

.04201 

.04329 

.04336 

.04336 

12 

.01053 

.01020 

.01052 

.01053 

.01053 

.05 

5 

.25862 

.25179 

.25829 

.25861 

.25862 

7 

.10023 

.09759 

. 10011 

. 10023 

. 10023 

9 

.03885 

.03782 

.03880 

.03885 

.03885 

12 

.00937 

.00913 

.00936 

.00937 

.00937 

.01 

5 

.23845 

.23300 

.23821 

.23845 

.23845 

7 

.09208 

.08998 

.09199 

.09208 

.09208 

9 

.03556 

.03475 

.03552 

.03556 

.03556 

12 

.00853 

. 008.34 

.00853 

.00853 

.00853 

40 


TABLE  2.4.1(a)  (continued) 


0) 

s 

Exact 
Prob . 

Approxinate 

Probability 

r = 0 

r = 1 

r = 2 

r = 3 

.25 

6 

.23320 

.22720 

.23299 

.23320 

.23321 

8 

.08895 

.08666 

.08887 

.08895 

.08895 

10 

.03393 

.03305 

.03390 

.03.39.3 

.03393 

12 

.01294 

.01261 

.01293 

.01294 

.01294 

. 10 

5 

.27960 

.27492 

.27947 

.27960 

. 27960 

7 

.10590 

.10413 

. 10585 

. 10590 

. 10590 

9 

.04011 

.03944 

.04009 

.04011 

.04011 

12 

.00935 

.00919 

.00934 

.00935 

.00935 

.05 

5 

.25281 

.24925 

.25273 

.25282 

.25282 

7 

.09553 

.09418 

.09550 

.09553 

.09553 

9 

.03610 

.03559 

.03608 

.03610 

.03610 

12 

.00838 

.00827 

.00838 

.00838 

.00838 

.01 

5 

.23321 

.23037 

.23315 

.23321 

.23321 

7 

.08795 

.08688 

.08793 

.08795 

.08795 

9 

.03317 

.03277 

.03316 

.03317 

.03317 

12 

.00768 

.00759 

.00768 

.00768 

.00708 

37 


38 


TABLE  2.4.1(b) 

Exact 

and  Approximate  Values  of  Pr(Sg  > s)  for  p = 4 

Approximate  Probability 

Pvnpf 

n U) 

s 

Prob . 

r = 0 

r = 1 

r = 2 

r = 3 

10  .25 

12 

.23599 

.21950 

.23573 

.23629 

.23605 

IS 

.08778 

.07964 

.08712 

.08780 

.08779 

17 

.04343 

.03899 

.04298 

.04342 

.04343 

20 

.01448 

.01286 

.01430 

.01447 

.01448 

. 10 

11 

.21249 

.20183 

.21221 

.21257 

.21250 

13 

. 10772 

.10118 

.10734 

.10774 

. 10773 

15 

.05259 

.04902 

.05233 

.05259 

.05259 

19 

.01166 

.01076 

.01158 

.01166 

.01166 

.05 

10 

.25505 

.24492 

.25494 

.25513 

.25506 

13 

.09135 

.08644 

.09107 

.09135 

.09135 

15 

.04396 

.04135 

.04378 

.04396 

.04396 

19 

.00952 

.00888 

.00946 

.00952 

.00952 

.01 

10 

.22758 

.21924 

.22741 

.22762 

.22758 

12 

.11436 

. 10922 

.11411 

.11436 

.11436 

14 

.05524 

.05244 

.05506 

.05523 

.05524 

18 

.01196 

.01127 

.01190 

.01195 

.01196 

20 

.25 

12 

.21949 

.20981 

.21934 

.21954 

.21950 

14 

.10817 

. 10235 

.10792 

.10817 

. 10817 

16 

.05118 

.04810 

.05101 

.05118 

.05118 

20 

.01061 

.00989 

.01056 

.01061 

.01061 

.10 

10 

.28134 

.27401 

.28134 

.28137 

.28135 

13 

.09648 

.09297 

.09636 

.09648 

.09648 

15 

.04486 

.04305 

.04478 

.04486 

.04486 

19 

.00901 

.00860 

.00899 

.00901 

.00901 

.05 

10 

.24408 

.23829 

.24403 

.24410 

.24408 

12 

.11917 

.11564 

.11907 

.11917 

.11917 

15 

.05571 

.05384 

.05564 

.05571 

.05571 

18 

.01123 

.01080 

.01121 

.01123 

.01123 

.01 

10 

.21716 

.21246 

.21711 

.21717 

.21717 

12 

. 10475 

.10199 

.10468 

. 10476 

.10476 

14 

.04852 

.04709 

.04847 

.04852 

.04852 

18 

.00964 

.00932 

.00963 

.00964 

.00964 

TABLE  2.4.1(b)  (continued) 


r r 


'T 


39 


Approximate  Probability 


0) 

s 

Prob . 

*-< 

II 

o 

r = 1 

r = 2 

II 

,25 

12 

.20982 

.20461 

.20978 

.20984 

.20983 

14 

. 10002 

.09702 

.09996 

.10003 

. 10003 

16 

.04578 

.04424 

.04573 

.04578 

.04578 

20 

.00888 

.00854 

.00887 

.00888 

.00888 

10 

10 

.27474 

.27078 

.27474 

.27475 

.27474 

13 

.09058 

.08877 

.09054 

.09058 

.09058 

15 

.04101 

.04010 

.04099 

.04101 

.04101 

18 

.01190 

.01151 

.01189 

.01190 

.01190 

05 

10 

.23784 

.23475 

.23784 

. 23786 

.23785 

12 

.11337 

.11155 

.11335 

.11338 

.11338 

14 

.05172 

.05078 

.05070 

.05072 

.05072 

18 

.00992 

.00972 

.00992 

.00992 

.00992 

01 

10 

.21138 

.20889 

.21137 

.21139 

.21139 

12 

.09972 

.09830 

.09970 

.09973 

.09973 

14 

.04515 

.04443 

.04514 

.04515 

.04515 

18 

.00857 

.00841 

.00856 

.00857 

.00857 

40 


TABLE  2.4.1(c):  Exact  and  Approximate  Values  of  Pr(SQ  > s)  for  p = 6 


Approximate  Probability 


n 

(1) 

s 

l 

Prob . 

r = 0 

r = 1 

r = 2 

r = 3 

10 

.25 

17 

.27453 

.25784 

.27518 

.27483 

.27453 

21 

.09096 

.08189 

.09041 

.09101 

.09097 

23 

.04932 

.04373 

.04885 

.04933 

.04933 

25 

.02599 

.02276 

.02567 

.02598 

.02600 

. 10 

16 

.21431 

.20371 

.21428 

.21439 

.21431 

20 

.06519 

.06039 

.06491 

.06519 

.06519 

22 

.03407 

.03127 

.03386 

.03406 

.03407 

24 

.01735 

.01580 

.01722 

.01734 

.01735 

.05 

15 

.23594 

.22646 

.23597 

. 23600 

.23594 

18 

.09828 

.09264 

.09805 

.09830 

.09828 

20 

.05207 

.04865 

.05188 

.05208 

.05208 

23 

.01901 

.01757 

.01890 

.01900 

.01901 

.01 

14 

.26908 

.26044 

.26920 

.26914 

.26909 

17 

.11319 

. 10774 

.11301 

.11321 

.11320 

20 

.04327 

.04070 

.04313 

. 04  3 2 7 

.04327 

22 

.02195 

.02053 

.02186 

.02195 

.02195 

20 

.25 

17 

.25868 

.24850 

.25884 

.25874 

.25868 

20 

.10594 

.09978 

.10575 

.10595 

. 10594 

22 

.05520 

.05148 

.05503 

.05520 

.05520 

25 

.01955 

.01803 

.01946 

.01955 

.01955 

. 10 

15 

.26791 

.26106 

. 26799 

.26793 

.26791 

18 

.10825 

.10410 

.10816 

. 10826 

. 10825 

20 

.05581 

.05332 

.05573 

.05581 

.05581 

23 

.01943 

.01843 

.01939 

.01943 

.01943 

.05 

15 

.22411 

.21865 

.22412 

.22413 

.22412 

18 

.08781 

.08478 

.08774 

.08781 

.08781 

20 

.04456 

.04281 

.04451 

.04456 

.04456 

22 

.02192 

.02100 

.02188 

.02192 

.02192 

.01 

14 

.25899 

.25399 

.25903 

.25901 

.25900 

17 

. 10300 

.10005 

.10295 

. 10301 

.10301 

19 

.05258 

.05084 

.05253 

.05258 

.05258 

22 

.01805 

.01736 

.01803 

.01805 

.01805 

TABLE  4.2.1(c)  (continued) 


Approximate  Probability 


n 

s 

Exact 
Prob . 

r = 0 

r = 1 

r = 2 

r = 3 

40 

.25 

17 

.24905 

.24345 

.24910 

. 24907 

.24907 

20 

.09733 

.09412 

.09728 

.09734 

.09733 

22 

.04910 

.04724 

.04905 

.04910 

.04910 

25 

.01655 

.01583 

.01653 

.01655 

.01655 

.10 

15 

.26090 

.25719 

.26093 

.26092 

.26091 

18 

. 10162 

.09946 

. 10160 

.10163 

.10162 

20 

.05107 

.04981 

.05104 

.05107 

.05107 

23 

.01710 

.01661 

.01708 

.01710 

.01710 

.05 

15 

.21744 

.21452 

.21745 

.21746 

.21746 

18 

.08235 

.08080 

.08233 

.08236 

.08235 

20 

.04082 

.03944 

.04080 

.04082 

.04082 

23 

.01344 

.01312 

.01344 

.01345 

.01345 

.01 

14 

.25325 

.25058 

.25328 

.25328 

.25327 

17 

.09765 

.09613 

.09764 

.09766 

.09766 

19 

.04878 

.04790 

.04877 

.04878 

.04878 

22 

.01619 

.01586 

.01619 

.01619 

.01619 

42 


TABLE  2.4.1(d):  Exact  and  Approximate  Values  of  Pr(Sy  > s)  for  p = 8 


Approximate  Probability 


n 

0) 

s 

Exact 
Prob . 

O 

II 

u 

r = 1 

r = 2 

II 

10 

.25 

23 

.24078 

.22487 

.24132 

. 24099 

. 24077 

26 

.11584 

.10459 

.11544 

.11593 

.11584 

28 

.06771 

.06000 

.06722 

.06774 

.06772 

30 

.03836 

.03344 

.03794 

.03836 

.03836 

.10 

20 

. 28630 

.25514 

.26665 

.26638 

.26630 

24 

.09635 

. 089  52 

.09609 

.09637 

.09635 

26 

. 05460 

.05012 

.05435 

.05461 

.05460 

28 

.03002 

.02727 

.02983 

.03002 

.03002 

.05 

19 

.27593 

.26629 

.27623 

.27599 

.27593 

23 

.09921 

.09323 

.09901 

.09922 

.09921 

25 

.05596 

.05204 

.05577 

.05597 

,0559b 

27 

.03060 

.02820 

.03045 

.03060 

.03060 

.01 

19 

.23504 

.22683 

.23514 

.23508 

.23504 

22 

. 10748 

.10195 

.10732 

.10749 

. 10748 

25 

.04496 

.04207 

.04481 

.04496 

.04496 

27 

.02421 

.02248 

.02410 

.02421 

.02421 

20 

.25 

23 

.22347 

.21384 

.22360 

.22351 

.22347 

26 

.09956 

.09335 

.09941 

.09958 

.09957 

28 

.05511 

.05110 

.05495 

.05512 

.05511 

30 

.02952 

.02711 

.02939 

.02952 

.02952 

. 10 

20 

.25360 

.24700 

.25370 

.25362 

.25361 

23 

.11314 

. 10864 

.11307 

.11315 

.11315 

26 

.04577 

.04347 

.04570 

.04578 

.04577 

28 

.02403 

.02269 

.02398 

.02403 

.02403 

.05 

19 

.26477 

.25909 

.26486 

.26479 

.26478 

22 

.11825 

.11432 

.11820 

.11826 

.11826 

25 

.04778 

.04576 

.04772 

.04778 

.04778 

27 

.02504 

.02386 

.02500 

.02504 

.02504 

.01 

19 

.22387 

.21915 

.22391 

. 22390 

.22389 

22 

.09707 

.09408 

.09703 

.09708 

.09708 

24 

.05270 

.05080 

.05265 

.05270 

.05270 

27 

.01983 

.01899 

.01980 

.01983 

.01983 

TABLE  2.4.1(d)  (continued) 


Approximate  Probability 


CO 

s 

Exact 

Prob. 

r = 0 

r = 1 

r = 2 

r = 3 

.25 

22 

.27576 

.27014 

.27588 

.27579 

.27578 

26 

.09083 

.08760 

.09079 

. 09084 

.09084 

28 

.04873 

.04671 

.04868 

.04873 

.04873 

30 

.02528 

.02411 

.02524 

.02528 

.02528 

. 10 

20 

.24622 

.24262 

.24626 

.24624 

.24623 

23 

.10612 

. 10377 

. 10610 

. 10612 

. 10612 

25 

.05715 

.05566 

.05713 

.05715 

.05715 

28 

.02117 

.02051 

.02115 

.02117 

.02117 

.05 

19 

.25832 

.25526 

.25838 

.25836 

.25834 

22 

.11182 

.10977 

.11182 

.11184 

.11184 

25 

.04368 

.04266 

.04366 

.04368 

.04368 

27 

.02236 

.02178 

.02235 

.02236 

.02236 

.01 

18 

.28245 

.27977 

.28250 

.28248 

.28248 

22 

.09162 

.09008 

.09162 

.09163 

.09163 

24 

.04871 

.04775 

.04870 

.04872 

.04872 

26 

.02503 

.02447 

.02502 

.02503 

.02503 

TABLE 

2.4.2: 

Approximate  Values 

of  Pr(S() 

0) 

> s)  - 

Using  r = 

3 

. 10 

.11 

.12 

.15 

.05 

.06 

.075 

p 

n s 

. 10 

.09 

.08 

.05 

.05 

.04 

.025 

2 

10  7 

. 12184 

. 12184 

. 12185 

.12190 

. 10927 

. 10927 

. 10928 

y 

.04989 

.04989 

.04990 

.04992 

.04439 

.04439 

.04439 

12 

.01307 

.01307 

.01308 

.01308 

.01149 

.01149 

.01149 

20  7 . 

,11138 

. 11138 

.11139 

. 11140 

. 10023 

. 10023 

. 10024 

9 

.04336 

.04336 

.04336 

.04337 

.03885 

.03885 

.03885 

12  . 

01053 

.01053 

.01053 

.01054 

.00937 

.00937 

.00937 

0) 


P 

n 

s 

.10 
. 10 
. 10 

.11 

.10 

.09 

.11 

.11 

.08 

. 12 
.09 
.09 

.15 

.10 

.05 

.20 

.05 

.05 

.05 

.05 

.05 

.06  .075 
.05  .050 
.04  .025 

3 

10 

10 

.11656 

.11656 

.11656 

.11656 

.11660 

.11667 

.10154 

.10154  .10154 

12 

.05314 

.05314 

.05315 

.05315 

.05316 

.05320 

.04575 

.04575  .04576 

16 

.01049 

.01049 

.01049 

.01049 

.01049 

.01050 

.0088 5 

.00885  .00885 

20 

10 

. 10550 

.10550 

. 10550 

. 10550 

. 10552 

.10555 

.09206 

.09206  .0920() 

12 

.04577 

.04577 

.04577 

.04577 

.04578 

.04579 

.03961 

.03961  .03962 

16 

.00817 

.00817 

.00817 

.00817 

.00817 

.00818 

.00698 

.00698  .00698 

U) 

P 

n 

s 

. 10 
. 10 
. 10 
. 10 

.11 
. 10 
.10 
.09 

.12 

.11 

.09 

.08 

.15  .15  .16  .05 
.10  .125  .08  .05 
.10  .075  .08  .05 
.05  .05  .08  .05 

.06 

.05 

.05 

.04 

.08 

.06 

.04 

.02 

4 

10 

13 

. 10773 

. 10773 

. 10773 

.10776  .10776  .10775  .09135 

.09135 

.09136 

15 

.05259 

.05260 

.05260 

.05261  .05262  .05261  .04396 

.04396 

.04397 

18 

.01712 

.01712 

.01712 

.01712  .01712  .01712  .01404 

.01404 

.01405 

20 

13 

.09648 

.09648 

.09648 

.09649  .09649  .09649  .08184 

.08184 

.01814 

15 

.04486 

.04486 

.04486 

.04487  .04487  .04487  .03765 

.03765 

.03765 

18 

.01356 

.01356 

.01356 

.01356  .01356  .01356  .01123 

.01123 

.01123 

(Computer  programs  used  for  the  computations  are  available  from  the 
author. J 

As  expected,  the  accuracy  of  the  approximation  for  a given  number 
of  terms  improves  as  n increases.  However,  the  approximation  is  quite 
good  for  n as  small  as  10  if  s is  sufficiently  different  from  KV.  For 
any  values  of  p,  n,  and  u>,  the  approximation  worsens  as  s approaches 
EV.  If  s is  very  close  to  EV,  the  approximation  can  be  extremely  inaccu- 
rate, especially  for  odd  p,  for  which  higher  derivatives  of  h^CyJ 
become  infinite.  Obviously,  the  approximation  should  not  be  used  for 
s close  to  EV.  If  s = EV  or  s < EV,  other  approximations  may  be  possible. 
However,  this  question  has  not  been  addressed  here.  In  other  cases  the 
approximation  appears  to  be  quite  good, and  the  restriction  to  upper 
tail  probabilities  does  not  seem  to  be  a serious  drawback. 

There  appears  to  be  some  tendency  for  the  approximation  to  worsen 
as  p increases.  However,  this  tendency  is  slight  and  does  not  appear  to 
cause  significant  problems  in  using  the  expansion  as  an  approximation. 
Finally,  we  note  that,  in  most  cases,  a relatively  accurate  approxima- 
tion can  be  obtained  by  including  only  terms  of  order  n computations 
are  simpler  in  that  case,  involving  only  the  variance  of  a linear 
combination  of  chi-squared  random  variables  and  not  the  more  complicated 
higher  moments. 

The  figures  in  Table  2.4.2  indicate  that  the  distribution  of  SQ 
is  quite  robust  with  respect  to  variability  of  fi.  The  major  factor 
in  determining  the  distribution  for  a given  value  of  p is  the  average 
of  the  oj’s  (or  A’s).  As  would  be  expected,  the  tail  probability  in- 
creases as  the  sum  of  absolute  deviations  from  w increases.  However, 


46 


this  effect  is  relatively  unimportant  when  compared  to  the  effect  of 
(u,  n and  the  difference  between  s and  EV. 

In  Chapter  3,  we  shall  investigate  the  noncentral  distribution  of 
SQ  and  derive  some  results  under  more  complicated  models.  There  too, 
additional  extensions  and  unsolved  questions  are  discussed. 


' ' 7&e  - HAW 


3.  CLASSIFICATION  UNDER  MORE  COMPLICATED  MODELS 


3.1  Noncentral  distribution  of  S^. 

When  Sqi ,S02’ ' ‘ ' ,S0n  are  aS  defined  paragraph  (A)  on  pages  8-9, 

■ — j 

we  have  derived  the  null  distribution  of  S.  = n ).  .S. that  is,  when 

0 Lj=l  Oj 

the  distributions  of  X..  and  X_,...,X  are  identical.  The  resulting 

— 0 — 1 — n 

distribution  reduces  to  that  of  a particular  quadratic  form  in  indepen- 
dent normal  variables  with  zero  means,  or  equivalently  to  a linear  com- 
bination of  independent  central  chi-squared  variates. 

For  calculating  the  probabilities  of  misclassification  in  the  case 
already  studied,  as  well  as  in  cases  with  more  than  one  population  to 
which  the  new  object  may  be  assigned,  we  are  also  interested  in  the 
distribution  of  S^  when  the  distribution  of  Xq  is  different  from  the 

common  distribution  of  X.,...,X  . When  EX~  = / p = EX.  (i=l,...,n), 

—l  — n — u — u — —l 

using  the  same  arguments  as  in  the  central  case,  we  can  clearly  express 
Sq  as  the  same  quadratic  form  with  the  normal  random  variables  having 
non-zero  means,  or  in  the  equivalent  form  of  a linear  combination  of 
independent  noncentral  chi-squared  random  variables.  As  we  shall  see  in 
the  following  theorem,  however,  the  actual  representation  obtained  is 
somewhat  simpler. 


Theorem  3.1.1.  Let  SQ1 ,SQ2 , . . . ,SQn  be  as  defined  in  paragraph  (A) 
pages  8-9,  with  ^ P.-  Then  SQ  = n can  be  represented  as 

S0 . • 


1 

J 


(3.1.1) 


48 


where  {W'W,.}.  , are  mutually  independent  random  variables, 

lj’  2)  j=l,...,p  v 

Wj^  being  a n<  ncentral  chi-squared  random  variable  with  one  degree  of 

2 

freedom  and  noncentrality  parameter  ( n j is  defined  in  (3. 1.3}  in  the 

proof),  W_^  being  a central  chi-squared  random  variable  with  n-1 

degrees  of  freedom,  and  are  the  characteristic  roots  of 

I + 2I_1A. 

-P  ~ - 

Proof:  We  can  still  write  (as  in  the  central  case  considered  in  Theorem 
2.2.1)  Sq  = n *U'U  where 




and  Uq.  = ^ 


X.  + Y..  - Y._. 
~3  -0)  -JO 


Thus  U„.  is  distributed  as 
-0j 


Np(y^-)j,  2(E+A))  and  U is  ((^,  V) , where 


£ = 

, V = 

I + 2Q 
“P  ~ 

I 

“P 

i 

-p 

I +2 ft  . 

-p  - 

. . I 
“P 

. . I 
“P 

I 

L ~p 

I 

~P 

. . I +211 

“P  ~ J 

and  n = 


Then,  as  before,  by  an  appropriate  transformation,  we 


can  express  as 


-1  2 

50.„  ■ 


(3.1.2) 


where  Z,,...,Z  are  independent  unit  normal  random  variables,  n-  is 
1 np  r 3 

the  same  function  of  the  0’s  as  Z^  is  of  the  U’s,  and  (^,...,01^  arc 
the  characteristic  roots  of  V.  Since  the  a’s  are  the  same  as  in 
Theorem  2.2.1,  all  that  remains  to  prove  the  result  is  to  determine 


49 


the  values  of  H 

1 np 

To  find  the  n’s,  we  must  be  more  specific  about  the  transformation 

leading  to  (3.1.2).  Since  V is  symmetric  we  can  write  V = 3 D g1, 

where  D is  a diagonal  matrix  containing  the  characteristic  roots  of  V 

and  Q is  the  associated  matrix  of  orthogonal  characteristic  vectors. 

Let  D2  be  the  square  root  of  D (i.e.,  D2D2  = D)  and  D ^ its  inverse. 

- 

Let  Z = D Q*  (U-0) . Then  Z is  normal  with  mean  0 and  covariance 

H -U 

matrix  I . Since  U = Q D Z + 9,  if  we  let  n = D 2Q*0,  we  have 

-np  — -*■ — — __  -x  — 

U = £ d'^Z+h)  and 

"P  2 

U'U  = cz+n) ’D(z+n)  = £ a.(z  +n.)  . 

3=1  j ■>  J 

By  Lemma  2.2.3,  we  know  that  p of  the  a’s  are  equal  to  n + A^ 

(j=l,...,p),  and  the  rest  are  repetitions  of  A.  (j=l,...,p).  Assume 

that  we  number  the  a’s  such  that  a.  = n + A.  and  a.  , = A.  for 

1 a l+kp  1 

i = 1,2, .. . ,p;  k = 1,2,... ,n-l.  Let  A be  the  diagonal  matrix  containing 
A^,...,A  and  R be  the  associated  matrix  of  orthogonal  characteristic 
vectors  of  2p+2ft.  Then  by  Lemma  2.2.3,  we  have 


R 

R 

R 

R 

n 

/6 

/n(n-l) 

R 

-R 

R 

R 

H 

n 

/n(n-l) 

R 

n 

-2R 

R 

n 

u 

/n(n-l) 

0 0 . 


- (n-1) R 
/n(n-l) 


- 1 -U 

where  D*  is  a diagonal  matrix  with  elements  (1+n  X^)  . The  conclusion 

follows  on  substitution  of  (3.1.3)  and  the  a’s  in  (3.1.1).  D 

The  exact  distribution  of  in  the  noncentral  case  is  even  more 
intractable  than  in  the  central  case,  even  when  the  X’s  are  all  equal, 
and  a computational  scheme  will  not  be  given  here.  However,  an  asymp- 
totic expansion  similar  to  that  given  in  Theorem  2.3.1  for  the  central 
case  can  be  obtained.  It  is  given  in  the  following  theorem. 

Theorem  3.1.2.  Let  SQ,  X^ , rij  , (j  = l,2,...,p)  be  as 

defined  in  Theorem  3.1.1.  Then  for  s > 7?  ,X.  , 

^j  = l 3 

PrCSo^J  • J^IPrtx^  > (s-EVJ/6) 

2r-2  a.  .p.  , 

+ l JHrLl  + °tn’r  ) > (3.1.4) 

j-2  J‘ 

where  6 = 1+n'1!,  X = p'^^X.,  V = n"1^=1XjW2 . , p.  = E[(V-EV)/Bj j , 
and 


51 


dk  ’ <-H\I 

j=k  J 


(k=0, 1 , . . . ,r) 


c'  = (2k)'1  l B ' c ! 


k-1 


j=0  k“j  j 


4-  l 

j = l 


A-Ajik 


(k=l,2,...,r) 
•X-Aok-1 


(k=l,2, . . . ,r)  . 


The  coefficients  {a^  j^_Q  j ,j_Q  j are  the  same  as  in  Theorem 
2.3.1. 


Proof:  From  Theorem  3.1.1,  we  have  that  = U+V  where 
U = y (l.n’h.)W'  , V = if1  } A.kL.  . 

jil  1 11  j=l  3 23 

But  as  in  Theorem  2.3.1,  for  sufficiently  large  n,  we  can  express  the 
noncentral  distribution  of  U using  the  Laguerre  series  expansion  des- 
cribed in  Johnson  and  Kotz  [11],  chapter  29,  section  6.3.  With  the 
exception  that  the  constants  have  a different  form,  the  expansion  is  as 
in  the  central  case  in  Theorem  2.3.1,  i.e., 

0 o 

Pr(U>u)  = l c!B'jjl[r(!sp)/ra*p+j}] 
j=0  J 

oo 

X j fpCxB^JL^'^CW^dx  , (3.1.5) 

u 

_ 2 2 

where  6 = 1+n  X,  fp(x)  is  the  density  of  a Xp  random  variable,  and 


c 1 = 1 

o 


l = (2k)'1  I B'_ .c!  (j  > 1) 


k-1 


j=0 


k-j  j 


52 


3 fX-X . 


•i-  j.hr1 


j = l 


E 2 -1 

k > n.  (l+n  a.)  J 

j-1  J J 1 n 


k-1 


(k  * l)  . 


Note  that  the  only  difference  between  (3.1.5)  and  (2.3.10)  of  Theorem 


(a) 


2.3.1  is  the  form  of  the  B’s.  Using  the  definition  of  ’ (x)  and 


reversing  the  order  of  summation  we  obtain 


Pr (U>u)  = l d£Pr(Xp+2k  > u/B)  + «r(u) 


k=0 


where 


00  j 

Rr(u)  = l c!(Tj  { (-l)k(^)Pr(Xp+2k  > u/B) 
j=r+l  J k=0  P 

In  this  case,  for  some  Mj  (independent  of  n) , we  have 


j B • | < M 


p rA-A  k-1  p (■A-A.'i 

. y (l+n  LA.)  I 1 + l I — i 

^=1  nn  j=lln 


nr  -k+K 

= 0(n  ) , 


or  for  (independent  of  n) , < M^n 

and  for  sufficiently  large  n. 


-k+1 


Thus  |Cj||  5 2/n) 


k-1 


|Rr(u) | < h 


“21'  j |2H2P 


j=0 


nB 


, - r+1. 

= o(n  ) . 


Therefore  we  have 


Pr(SQ>s) 


= P(U+V>s) 


= l dvpr(xl,v  + V/B  > s/B)  + o(n*r+1).  (3.1.6) 


k=0 


*k  Ap+2k 


But  V is  as  in  Theorem  2.3.1, and  hence  V/B  satisfies  the  conditions  of 


A 


Lemma  2.3.2.  So  by  Lemma  2.3.1,  for  each  k. 


Pr(Xp+2k  + V/B  > S/B)  = Pr(V2k  > (s-EV)/&) 


2r-2  a.  ,y. 

♦ l J4H*o(n-'+l)  , 

j = 2 J- 


(3.1.7) 


where  the  a’s  are  the  same  as  in  Theorem  2.3.1.  The  final  form 
(3.1.4)  results  from  substitution  of  (3.1.7)  in  (3.1.6).  □ 

It  is  possible  that  if  terms  are  collected  as  in  Theorem  2.3.1, 
those  not  included  will  be  small,  while  those  in  a given  grouping  will 
be  of  a similar  order  of  magnitude.  However,  due  to  the  more  complex 
nature  of  the  coefficients,  this  conjecture  has  not  been  proved.  We 
also  remark  here  that  if  a computer  program  is  available  to  compute  the 
approximate  central  distribution  in  Theorem  2.3.1,  it  would  be  a rela- 
tively minor  modification  to  use  it  for  the  noncentral  distribution, 
the  only  major  change  being  in  the  computation  of  the  B’s. 


3.2  Classification  of  several  new  objects. 

In  most  instances,  classification  procedures  improve  rapidly  when 
several  new  similar  objects  are  to  be  classified,  rather  than  just  one 
as  in  the  cases  we  have  investigated  thus  far.  So  it  is  natural  to 
question  whether  the  type  of  procedure  based  on  imperfect  distance  data 
behaves  in  such  a manner.  A slightly  different  notation  will  have  to 
be  used  in  this  case;  however,  the  basic  model  is  similar. 

(B)  For  k = 1,2,  let  , . . . ,X^  be  independent  and 

k 

identically  distributed  random  variables  with  the  N (jJ^.E) 

distribution,  and  let  {Y^P.Y^}.  . ...  be 

13  ’-ji  i=l,...,n.;j=l,...,n2 

independent  and  identically  distributed  random  variables  with 
the  Np(0,A)  distribution.  As  before,  we  interpret  the  X’s 
as  the  true  (unobservable)  positions  of  the  objects  in 


-fT 


54 


! 


question,  and  the  Y’ s as  errors  made  in  determining  those 
positions.  Let 


S.  . 
ij 


(X^+Y^P-X^-y!^)  'E'1 

i -ij  ~J  -Ji  - 


(xP^+yfP 

-i  -xj 


•X<« 

-J 


Thus  is  the  measured  or  perceived  distance  between  object 
i in  the  new  group  and  object  j in  the  "known"  group. 


The  natural  extension  of  to  the  model  in  paragraph  (B)  is  then 

nl  n2 


S = — — 

B n,n„  . ij 


l I » 


12  i=l  j=l 


the  average  of  all  distances  between  objects  in  group  1 and  those  in 
group  2.  The  following  theorem  gives  the  distribution  of  SD  under  this 

D 

model,  with  y^  = y^J  that  is,  the  null  or  central  case  when  all  objects 
are  actually  from  the  same  population. 


Theorem  3.2.1.  Let  { S . . } . 


be  as  defined  in 


ij  i-l,...,n1;j«l,...,n2 

-lr  1 r 2 


paragraph  (B)  above,  with  Uj  = y2-  Let  Sfi  = (nj  n2->* % = l^j  = lSij 

Then,  for  n,,n0  > 2,  we  can  represent  S_  as 
1 Z D 


SB~-  +niiXjW2j 


•1  -1 


■1 


+ (n2  +nl  (Xj"1))W3j  + ni  CyDV  ’ (3.2.1) 


where  all  of  the  W’s  are  mutually  independent  random  variables  and  for 

j = 1,2, .. . ,p,  are  distributed  respectively  as 

*1’  Vl'  V-l-  l)(n  1)  * and  X1*X2*---*Xp  are  the  Characteristic 

1 Z , i j j 

roots  of  Tp  + 2n2  fl,  ft  = ^ 'i.  (Note  that  the  matrix  of  which  the 

A’s  are  the  characteristic  roots  is  different  than  in  the  previous  cases.) 

Proof:  Let  U..  = - xj2))  and  T.^  = - y|2)).  Then 


w 


55 


= (n^n^)  W'W,  where 


w-  - CEi! »in2 ! «;in2J 

and  W. . = U..  + T...  But  W is  normally  distributed  with  mean  0 and 
-13  -ij  -ij 

some  covariance  matrix  V.  Thus,  as  in  earlier  proofs,  we  can  repre- 


sent S as 

D 


nin2P 

h ’ J,  Vj  ' 

3 = 1 


where  a,,a_,...,a  are  the  characteristic  roots  of  V and  Z. , . . . , 

1 2 nln2P  ~ i 

Z are  independent  unit  normal  random  variables.  Thus  we  need  to 

nln2P 

determine  V and  its  characteristic  roots. 

First  we  note  that  U. . is  distributed  N (0,21  ),  T. . is  distributed 
-ij  P ~ ~P  “IJ 

Np (0_, 2£2) , and  U„  and  T\j  are  independent  for  any  choice  of  i and  j. 

The  T’s  are  mutually  independent,  but  the  U’ s are  not.  We  can  write  V 
as  a partitioned  matrix  with  (i,j)-th  element  the  n^pxn^p  matrix  which 
is  the  covariance  of  the  i-th  and  j-th  random  n^p-vectors  in  W.  But 


cov(W. . ,W. . ) = EW..W! . 

-13  -13  -13-13 


= E(U. .U! . + T..U!.  + U..T!.  + T..T!.) 
'•-13-13  -13-13  -13-13  -13  13 


= cov(U..,U..)  + cov(T..,T..) 

—13  —13 1 -13  -13 


21  + 2fi. 

-P 


Similarly,  if  i t k,  covCW^.W^)  = 1^;  if  3 + k,  covCW^.W^) 
if  i t 3,  k f SL,  cov(Wik,  -j*}  = 0.  Thus 


V . ...  V 
- "n21  -n2n2  - 


where 


56 


V.  . = 

—li 


21  + 2ft 
-P  ~ 


I 

“P 


I 

-P 

21  *2n 
-p  - 


I 

-p 


I 

-p 

I 

-p 


21  +2ft 
-P  ~ 


(rijpxnjP)  , 


and  for  i / j , V. . = I 

-ij  -n, p 


tion  |V-a_I 


^ln2P 


We  now  wish  to  solve  the  determinantal  equa- 
= 0.  Using  Lemma  2.2.1,  with 


A = 


(l-oOJ^+2fi 

I 

-P 


I 

-T> 


I 

-P 

(l-a)I  + 2ft 
-P  - 


I 

~P 


and  B = I , we  obtain  |V-aI 

— — n n 1 — — 


-v 


i 

-p 

i 

-p 


(l-a)Ip+2fi 


“"lV 


n--l 

| A | IDjI,  where 


°1  = 


(1-a  +n0)I  +2ft  I 
2i>  - -p 


I 


I 

“P 


(l-a+n_)I  +2ft 
2 ~V  ~ 


I 

-P 


But  the  roots  of 


,n2'1 


r 

-p 


i 

-p 


(l-a+n_)I  +2ft 
2 — p - 


0 are  just  the  characteristic  roots  of  the 
matrix  D.  = I 0C+J  0 I where  C , = 2ft  , each  occurring  n.-l 

— ^ — n^  —1  -fij  — p — 1 — 2 

times.  By  Lemma  2.2.2,  the  characteristic  roots  of  are,  for  j = 
l,...,p,  n^  + 2u. , each  occurring  once,  and  2co. , each  occurring  n ^ - 1 
times,  with  u)j,...,U)  being  the  characteristic  roots  of  ft  . 
Similarly,  the  roots  of  |l)  | = 0 are  the  characteristic  roots  of 


57 


D = I 0 C_  + J 0l  , where  C_  = n_I  + 2ft  , which  by  Lemma 
—3  — n i —2  — rij  — p —2  2^p 

2.2.2  consist  of,  for  j = ni+n2+2ojj,  each  occurring  once, 

and  n~  + oo- , each  occuring  n.-l  times.  Thus  the  set  (a,  ,a_»  • • • ,a  ) 
1 J 1 12  iyi2pJ 

consists  of,  for  j = l,...,p. 


n,+n_  + 2u>. 

1 2 j 


n^  + 2au 


ni  + 2a)j 


2 co. 
J 


each  occurring  once 


" " n^-l  times 


" " n2~l  times 


It  It 


(n^-l)(n2-l)  times 


But 


n,+n„+2co.  , l+2n_\j0.  , A. 

1 ? L = JL  + 2_1  = _L  + -J. 


nln2 


V2a)j 

nln2 


n^+2aK 

nln2 


2co. 


1 


n2  ni 


2co . ■n  A . 

= ^ M +_J.  =_i 

ni  l v ni 


nln2 


2n2^cpj  Aj  -1 


“1  "1 

The  result  follows  upon  substitution,  summing  together  the  squared  unit 
normal  random  variables  having  the  same  multipliers.  ft 


With  this  characterization  of  S„,  we  can  see  that,  for  fixed  n., 

D i 

the  effect  of  the  error  in  position  determination  becomes  less  important 
as  n2  gets  larger,  because  2n21ft  becomes  smaller.  Since  the  performance 


r>8 


of  the  classification  rule  depends  in  some  measure  on  the  variance 
of  the  classification  criterion,  we  would  like  that  variance  to  de- 
crease as  n~  increases.  In  fact,  the  dominating  term,  E(n  *+n  A.)W.., 

Z Z 1 J 1 J 

does  have  decreasing  variance  for  increasing  n2>  The  third  and  fourth 
factors  in  the  sum  in  (3.2.1)  may  offset  some  of  that  reduction;  however, 
their  contribution  to  the  overall  variance  should  tend  to  be  small,  due 
to  the  smallness  of  A ^ — 1 . The  numerical  example  below  tends  to  support 
this  conclusion.  Before  giving  the  example,  however,  we  note  that  the 
special  case  of  n^  = 1 can  be  derived  from  the  form  of  (3.2.1),  since 
the  third  and  fourth  factors  in  the  sum  are  then  degenerate  (zero 
degrees  of  freedom)  and  putting  n,  = 1 in  the  first  two  factors  does 
yield  the  characterization  in  Theorem  2.2.1. 


Example:  Let  SI  = 0.1  I^>,  n^  = 10.  When  n2  = 1,  Var(S^)  = 
Var(1.12x2  + 0.12x^g)  = 5.536.  When  n2  = 2,  VarfSg)  = Var(0.61X2  + 
O.llXjg  + 0.51X2  + °-01Xig)  = 2.968.  When  n2  = 4,  Var(Sg)  = Var(0. 355x2 
+ 0.105x^8  + 0.255X^  + 0. 005x^5  = 1-684. 

Thus,  if  we  consider  variance  of  the  criterion  as  a measure  of 
performance,  we  do  appear  to  have  better  classification  when  there  are 
more  than  one  object  available  to  be  classified.  We  would  also  be 
interested  then  in  computing  probabilities  of  misclassification . The 
exact  distribution  of  will  be  rather  complicated  and  obtaining  a 
practical  computational  scheme  is  difficult.  However,  we  can  approxi- 
mate the  distribution  using  the  asymptotic  expansion  developed  earlier. 
The  procedure  will  be  outlined  briefly  here. 

For  n2  relatively  small  and  n^  large,  we  can  write  as 


59 


S = U+V,  where 

D 


U=  ^{(n^+n^X.JWj.  ♦ (n^+nj1  (Xj-l))W^^ } , 

V = l {n^X.W..  + n"1  (X.-l)W . . } , 
jal  1 3 2j  1 J 4jJ 


(3.2.2) 


2 2 

with  W . , W independent  Yi  and  Y i random  variables,  and  W„ . , W,. 
lj  3j  r 1 n.  - 1 2j  4j 

2 2 1 

independent  XR  and  x^n  _i)(n  i)  random  variables.  Thus  U is  a 
quadratic  form  in  normal  variables  with  all  multipliers  of  comparable 


size,  and  hence  it  will  have  a convergent  Laguerre  series  expansion 
similar  to  that  used  in  Theorem  2.3.1,  the  only  changes  being  different 


formulas  for  3 and  the  constants  (B.  }. 

k 


It  is  also  easy  to  verify  that  the  k-th  cumuU  of  V in  (3.2.2) 
1-r, 


is  of  order  0(n  ).  Thus  the  conditions  of  Lemmas  2.3.1  and  2.3.2 

are  satisfied  and  we  can  derive  an  asymptotic  expansion  with  the  same 
form  as  (2.3.9),  but  with  p replaced  there  by  n2p.  Everything  else 
remains  notationally  the  same,  with 


q -1  -1T  -1  -1,  ,, 

P = n2  + nJ  X - n1  n2  (n2*l) 


X = 


-1 


k 


3 = 1 


fX-X . 

\ = 1 I 1 ' — 

k ^jlnj  njn 


n2  - 1 1 k 


♦ (n2-l)f  + jjM  , 

2 j = l^  1 nln2' 


and  EV  = (1  - nj1)  fn2^j  = 1Xj  * P(n2-1)]  and  yj  is  the  j-th  central  moment 
of  V/3,  V being  the  random  variable  in  (3.2.2).  We  saw  in  the  case  of 
Sq  that  the  distribution  was  fairly  insensitive  to  minor  variations  of 


the  X’s.  The  distribution  of  S as  given  here  should  be  even  less  sen- 

D 


sitive  to  such  variations,  since  the  majority  of  the  terms  in  the 


60 


primary  expansion  (of  U in  (3.2.2))  depend  on  A^  only  through  A^-l 


which  is  nearly  zero. 


3.3  Discrimination  between  two  populations. 


If  we  consider  the  case  of  classifying  a single  new  object  as  coming 
from  one  of  two  known  populations,  usually  termed  discrimination,  we  can 
achieve  results  similar  to  those  obtained  in  the  less  complicated  cases 
already  investigated.  Since  the  distribution  involved  is  even  more 
complicated  than  in  the  cases  considered  thus  far,  we  will  only  give  a 
representation  for  the  criterion  and  make  a conjecture  concerning  a way 
of  approximating  its  distribution. 

(k)  (k) 

Using  the  notation  of  paragraph  (B)  on  pages  53-54,  )(  ,...,X 

l ~ nk 

are  i.i.d.  with  ^(p^.E)  distributions.  Let  be  an  additional  inde- 
pendent random  variable  distributed  as  NpCp^.E),  and  let  » 

j = 1,2,..., n^,  i = 1,2  be  independent  random  variables  distributed  as 

Np(0,4).  Let  S«>  • ♦ 4j>  - X»>  - . Y»>  - x‘°  - 

v(i) i c(i)  _ -1  Vni  c(i) 


Y *J),  and  = nT1  . 

—j0  l Tl  0] 

If  we  first  consider,  as  we  did  earlier,  the  case  with  the  contami- 
nating variables  removed,  we  find  that  classification  based  on  the  differ- 
ence the  natural  analogue  to  what  would  be  used  in  the 

absence  of  contamination.  The  following  theorem  gives  a representation 
of  the  criterion  - S^.  As  indicated  following  the  proof,  it  may 

be  possible  to  use  an  expansion  for  the  distribution  of  an  indefinite 
quadratic  form  to  generalize  Lemma  2.3.1  and  give  an  approximation  to 
this  distribution. 

Theorem  3.3.1.  Let  and  S ^ be  as  described  in  the  preceding 

discussion.  Then  the  statistic  can  be  represented  as 


61 


g(l)  . g(2)  = i.j15)2  - (l+n^Aj ) (wj2^  ♦ nj2J)2} 


+ l fn^X.W*  - n^X  W*  } 
1 J lJ  2 J 2jJ 


(3.3.1) 


where  X,,...,X  are  the  characteristic  roots  of  I + 2 fl,  Q = l 2, 

1 p -p  - - - - 

(W,.,  W*  , W*  } . , are  mutually  independent  random  variables,  VI*. 

--lj  lj  2j  j = l,...,p  r ij 

being  chi-squared  with  n.-l  degrees  of  freedom  and 

f W(1)  1 


being  bivariate  normal  with  mean  0 and  covariance  matrix 

i y . 

J 

yj  'J 

with  y-  = (n.n  )^[(n.+X .)  (n~  +X.)]”'S  and  is  as  defined  in  (3.3.3) 

J IZ  J ^ J J 

in  the  proof. 

Proof:  Letting  uj13  = £">*(X0+Y^J)  - xj^-Y^),  U(l)  • = (ujl} ' , . . . , 

U^')»  and  U'  = (U  ^ , U^'),  we  have,  as  in  several  previous 
theorems,  that  U is  normal  with  expected  value  0^  and  covariance  matrix 
V,  where 


e(1)- 

v V 

-11  -12 

II 

i(2) 

I 4 

V = 

V V 

-21  -22 

62 


. I 
“P 


(n.pxrijp) 


Let  A be  the  diagonal  matrix  containing  the  characteristic  roots  of 

and  R be  the  associated  matrix  of  orthogonal  characteristic  vec- 
tors; i.e.,  I ■ +2Q  = R A R'.  Then  by  Lemma  2.2.3',  we  have  V.  . = 

— p — J —ii 

Q.D.Q!  where 
^l— i-^i 


^n. (n. -1) 

i i 


W1  (n.pxn.p) 


-(n.-l)R 
/n. (n. -1) 


n.  I +A  0 

l-p  - - 


(i^pxn.p) 


0 0 


W(1)^  0 

~.2)  - 1 («  - a 

w J 2 d 2g: 


with  the  partitioning  and  indexing  of  W the  same  as  that  of  U.  Then 
W is  normal  with  expected  value  0 and  covariance  matrix 


63 


I 


where  V* . 
-ij 


where 


V* 

11 

V* 

-12 

V* 

21 

V* 

—22 

i 

Thus  V? 

— l 

£ 

2) 

0 

0 J 

Y1 

0 

0 ' 

• V 

nin2 

(n.pxiup)  , 


j l(n]+Aj)(n2+Xj)J 


Hence 


Pr(S^*3  - S(2)  <y)  = Pr(njV13 'U(13  - n“V2^'U{2)  < y) 

= Pr[njI(W(1-)+  D‘^e(1-)) 

- n^tW^*  D2^2i(2))'D2(W(2)+  H”*5  ^0(2))  s y]  . (3.2.2) 

As  in  Theorem  3.1.1, 


f A 


2i£(1) 


and  among  the  elements  of  W,  the  only  nonzero  covariances  are 

cov(W^!3,  W^23 ) = y.  for  j = l,...,p.  Thus  if  we  let 
1J  1J  j 1 


’ll 

= : 


KiJJ 


= (I  +ni1-3"!4--",S(H0‘iil) 


(3.3.3) 


and 


n 

wti  = l (w,Ck))2  f1  = i*2;  j = 1.2,. ...P)  . 

J k=2  J 

and  substitute  these  quantities  in  (3.3.2),  we  obtain  the  desired 
representation.  Q 

Of  course,  it  is  possible  to  make  an  additional  transformation  to 
obtain  a representation  eliminating  the  remaining  nonzero  covariances, 
by  using  the  fact  that 


'l 

Y 

= h 

i 

\ 

l 

1+  Y 

0 

1 

l' 

. y 

1, 

l 

-l 

0 

1_  Y. 

1 

-1 

For  computational  purposes  this  would  be  done;  however,  this  final  trans- 
formation leads  to  a notationally  less  convenient  form. 

Although  development  of  a computational  scheme  or  approximation 
has  not  been  attempted  here,  the  similarity  of  the  representation 
(3.3.1)  to  (3.1.1)  and  (2.2.3)  suggests  that  an  asymptotic  expansion 
might  be  feasible.  There  is  still  a dominant  term  (which  in  this  case 
tends  to  the  difference  between  two  noncentral  chi-squared  random 
variables)  aid  an  additional  term  which  is  tending  to  a constant  (in 
this  case  zero)  in  probability.  The  dominant  term  is  an  indefinite 
quadratic  form  and  so  the  theory  developed  for  the  previous  cases  is 
not  applicable.  However,  Press  [20]  gives  a series  expansion  for  the 
density  of  such  a form  which  is  similar  to  those  used  here  for  the 
positive  definite  forms.  Thus  it  may  be  possible  to  generalize  the 
results  given  here  to  that  case  or  to  prove  similar  results  for  the 
more  general  case. 


■ 


65 


3.4  Extensions  and  unsolved  problems. 

Several  possible  extensions  of  the  results  developed  in  Sections 
2 and  3 have  already  been  indicated,  together  with  conjectures  as  to 
how  they  might  be  approached.  We  wish  to  indicate  here  some  other  in- 
teresting and  desirable  generalizations. 

One  of  the  restrictions  placed  on  the  models  under  consideration 

in  Chapters  2 and  3 was  that  the  covariance  matrices  of  the  distributions 

of  both  the  unobservable  true  distances  and  of  the  error  components 

-h  -4 

be  known  (in  the  central  cases  in  Chapter  2,  only  !1  = Z A I was 
needed).  Certainly  in  many  situations  that  assumption  will  not  be 
satisfied.  In  the  central  cases,  if  we  assume  that  A is  a constant 
multiple  of  I with  the  constant  unknown,  we  would  like  to  develop  a 
classification  procedure  and  study  the  distributions  involved  when 
the  constant  is  estimated.  Since,  in  this  case,  the  expected  value  of 
the  observed  distances  is  a function  of  the  unknown  constant,  it  might 
be  possible  to  construct  an  estimate  based  on  the  observed  data  and  to 
obtain  some  useful  results.  Estimation  of  more  general  covariance 
structures,  however,  appears  less  promising.  In  that  case  we  would 
require  information  about  the  individual  characteristic  roots  of  ft, 
while  the  expected  value  of  the  observed  distances  depends  on  the  trace 
of  ft, but  not  otherwise  on  the  individual  roots. 

Of  course,  it  would  be  desirable  to  eliminate  as  many  of  the 
assumptions  on  the  distribution  of  the  distances  as  possible  and  to 
analyze  the  problem  with  a nonparametric  approach.  Some  attempts  wc'e 
made  in  this  direction  with  little  success.  The  primary  difficulty  is 
that,  under  any  reasonable  model,  the  observed  distances  cannot  be 


considered  to  he  independent  observations.  Perhaps  with  some  mild 
assumptions  concerning  the  nature  of  the  dependence,  some  useful  non- 
parametric  results  could  be  obtained. 


6ft 


In  a nonparametric  approach,  presumably  no  assumptions  would  be 
made  concerning  the  dimensionality  of  the  underlying  model.  The  re- 
lated problem  of  determination  of  the  dimensionality  based  only  on 
distance  measurements  among  the  objects  would  be  another  problem  of 
interest . 

Finally,  application  of  the  results  obtained  here  to  cluster  anal- 
ysis or  scaling  problems  would  be  desirable.  These  applications  would 
introduce  more  complexity  because  of  their  sequential  nature.  As  was 
mentioned  in  the  introduction,  these  problems  motivated  the  original 
inquiry  into  analysis  based  on  distances.  If  applications  of  these 
results  in  those  areas  could  be  developed,  it  would  be  gratifying. 


4.  NONCENTRALITY  ESTIMATION 


4. 1 Introduction. 

In  dealing  with  problems  involving  distance  measurements,  one  often 
encounters  the  noncentral  chi-squared  distribution.  We  have  seen  sev- 
eral such  instances  in  the  preceding  chapters.  An  even  simpler  example 
in  which  the  distribution  arises  is  the  following: 

Let  and  P2  be  points  in  euclidean  two-space  with  coordi- 
nates (x11>x12^  and  ^x21’x22-*  resPectively-  For  i*j  = 1>2* 

let  the  error  made  in  determining  the  coordinate  x^  be  e^, 

where  the  errors  are  independent  and  identically  distributed 
2 

N(0,g  ) random  variables.  Putting  y^j  = x^  + £^j>  the  measured 

2 2 

squared  euclidean  distance  between  P^  and  P2’ ^i-l^li ~^2i^  ’ as 
2 

distributed  as  2o  times  a noncentral  chi-squared  random  variable 

2 - 1 

with  two  degrees  of  freedom  and  noncentrality  parameter  (2o  ) 

times  the  true  squared  distance  between  Pj  and  P2,  Ij-j (xj i"x2i^  • 

If  we  are  interested,  then,  in  making  inferences  concerning  the 
true  distances  in  such  a situation  involving  measurement  error  or  other 
similar  types  of  errors,  we  are  naturally  led  to  the  problem  of  esti- 
mation of  the  noncentrality  parameter  of  a noncentral  chi-squared 
random  variable.  A further  complication  of  the  problem  can  be  intro- 
duced if  we  assume  that  our  observations  are  not  actually  measurements 
of  the  true  distances,  but  rather  some  monotonic  transformation  of  those 
distances.  Such  an  assumption  is  made,  for  example,  in  the  multi- 


dimensional  scaling  analysis  mentioned  in  Chapter  1.  In  this  chapter, 
we  shall  introduce  an  estimation  procedure  based  on  the  two-sample 
Wilcoxon-Mann-Whitney  statistic,  which  can  be  used  when  the  observed 
data  are  the  result  of  such  a monotonic  transformation.  We  will  inves- 
tigate the  properties  of  the  resulting  estimator  and  compare  them  with 
those  of  the  maximum  likelihood  estimation  which  would  be  applicable 
if  the  data  were  not  transformed. 


We  shall  need  two  representations  of  the  density  function  of 
2 2 

Xp  (X),  where  Xp  (X)  stands  for  a noncentral  chi-squared  random  variable 

with  p degrees  of  freedom  and  noncentrality  parameter  X.  (If  X = 0, 

2 2 

we  will  continue  to  use  Xp  instead  of  Xp  (0).)  The  first  representation 


f .(u)  = l (X/2)j(j!)'1exp(-X/2)f  ,.(u)  (u  > 0) , (4.1.1) 

P.A  j_Q  P+/J 


where 


fp(u)  = Js(%u)'ip'1[r(bp)]"1exp(-u/2) 


(u  > 0)  (4.1.2) 


is  the  central  chi-squared  density.  The  second  representation  is 


fp  A(u)  = i5(u/X)!4(P’2)exp[-Js(X+u)]Ijsp_1(/Xjr)  (u  > 0),  (4.1.3) 

where  Ip(x)  is  the  modified  Bessel  function  of  the  first  kind  of  order 
p.  These  formulas  are  given,  among  other  places,  in  Johnson  and  Kotz 
[11],  chapter  28. 

4 . 2 Maximum  likelihood  estimation. 

In  the  situation  in  which  the  observed  data  are  the  actual  measure- 


ments  and  not  a transformation  of  them,  maximum  likelihood  estimation 
would  be  applicable  and  the  resulting  estimator  will  have  desirable 
properties.  Although  the  estimator  to  be  proposed  will  apply  in  the 


more  general  case  with  transformed  measurements,  it  will  be  of  interest 
to  compare  it  with  the  maximum  likelihood  estimator  in  the  restricted 
case.  Thus  we  first  wish  to  discuss  some  results  of  the  maximum 


69 


likelihood  approach. 

Let  Xj,X2,...,X  be  independent  and  identically  distributed  (i.i.d.) 
random  variables  with  density  function  (4.1.3)  where  p is  known  and  X 
is  unknown.  Letting  q = *sp- 1 and  denoting  the  likelihood  function  by 
L,  we  have 

L = 2'V^  S {x^I  (^Jexpl-J*  l (A+x.)}  . 

j=l  3 q J j=l  J 

Thus 


log  L = -n  log  2-Sjnq  log  X 

n n 

+ l {hq  log  x.  + log  I (AxT)}  - h l (X+x.)  . (4.2.1) 

j=l  3 " 3 j=l  3 

Since 

aX  = [iq+1(®  + iq(^)]  ax  3 

= ^(u/X)1*!  (Au)  ♦ >s(q/X)Iq(Au)  , (4.2.2) 

we  have  from  4.2.1,  upon  differentiating,  the  likelihood  equation 

n I , (Ax.) 

nA  = l -32 -L  AT  . (4.2.3) 

j-i  iq(AJ7)  J 

Thus,  if  (4.2.3)  has  a solution,  it  will  provide  us  with  the  maximum 
likelihood  estimator  (MLE) . The  above  derivation  has  been  given  by 
Meyer  [18]  for  the  special  case  of  p = 2 and  by  Pandey  and  Rahman  [19] 
in  a modified  form  for  the  general  case;  it  is  given  here  for  complete- 
ness. Both  Meyer  and  Pandey  and  Rahman  also  give  the  following  results 


70 


V‘ 


(again  for  the  case  p = 2 in  Meyer) : 
rn 

(a)  if  > np , then  a unique  solution  of  (4.2.3)  exists 

and  gives  the  MLE  of  A^ ; 

(b)  if  < np,  the  MLE  of  A is  0; 

(c)  lim  Pr(£?  X.  > np)  = 1;  that  is,  as  n + ® , the  probability 

n-x»  ^ J 

approaches  one  that  the  MLE  is  based  on  the  observations  through 
equation  (4.2.3). 

In  investigating  the  efficiency  of  the  alternative  estimator  to  be 
proposed,  we  need  the  asymptotic  distribution  of  the  MLE  of  A.  It  is 
well  known  that  under  certain  regularity  conditions,  which  can  be  shown 
to  be  satisfied  in  this  case,  the  MLE  is  asymptotically  normal,  unbiased, 
and  efficient,  i.e.,  attains  the  Cramer-Rao  bound.  The  following  theorem 
states  these  results  more  precisely,  without  proof,  and  gives  the  form 
of  the  asymptotic  variance. 


Theorem  4.2.1.  Let  A*  be  the  MLE,  based  on  a random  sample  of 

2 

n observations,  of  the  noncentrality  parameter  of  a (A)  random  var- 
iable. Then  /n(A*-A)  is  asymptotically  normally  distributed  with  mean 
zero  and  variance  V*(A),  where  (with  q = %p-l) 


[VJ(X)]'1 


I , I j §)■**! 

0 


Iq.l(A;r) 

iq(AS) 


Proof : As  mentioned  prior  to  the  statement  of  the  theorem,  it  is  well 
known  that  under  the  conditions  of  this  theorem,  *^i(A*-A)  is  asymptoti- 
cally normally  distributed  with  mean  zero  and  variance 


l08  fp,A(u)J2rl 


I 


(see,  e.g.,  Kendall  and  Stuart  [14],  section  18.16).  Thus  we  have 
only  to  show  that 


71 


■ Max  fp,xO*)l2  . 


From  (4.1.3)  and  (4.2.2)  we  have 


I ,(^) 


log  fn  x(u)  = -H  + 


3X  p,X 


Iq(^) 


Thus 


E<3l  log  fp,X(u)^ 


■ % - ^^4  * «{©  } ■ 

x A I (i/Xu)  ’ ^ I (v^Xu)  > 

q q 

where  the  expectation  is  taken  with  respect  to  f , (u) . But 

p , A 

. I , (/Vu) 

© 2 f x(u)  = f 2 \ (u)  , 

X I (Au)  P’X  P+2’X 

q 

and 

© -■  f X(U)  = l,©**1  — e-«Uu)  . 

x iq(Au)  p,x  x iq(*^) 

The  result  follows  on  substitution  in  (4.2.4).  □ 


(4.2.4) 


4.3  A randomized  estimation  procedure. 

While  the  maximum  likelihood  estimator  has  many  desirable  pro- 
perties, an  explicit  solution  to  (4.2.3)  does  not  exist.  Of  course, 
since  Iq(x)  is  a known,  tabulated  function,  the  equation  can  be  solved 
numerically.  However,  from  the  computational  standpoint  an  estimation 
which  is  an  explicit  function  of  the  sample  values  is  often  preferable. 
In  addition,  we  would  like  to  be  able  to  estimate  X even  if  the 


4 


72 


i 

I 


! 


i 


observations  have  been  subjected  to  a monotone  transformation.  In 
this  section,  we  will  develop  an  alternative  estimation  procedure, 
based  on  the  two-sample  Wi lcoxon-Mann-Whitney  statistic,  which  satis- 
fies these  requirements. 

Before  introducing  the  proposed  estimator,  we  shall  give  some 
general  results  on  the  Wilcoxon-Mann-Whitney  statistic,  which  we  will 
use  later.  (Because  of  varying  ways  of  identifying  the  samples  and 
sample  sizes,  several  of  the  results  given  here  appear  slightly  differ- 
ent than  they  do  in  the  cited  sources.) 


Theorem  4.3.1.  Let  X, , . . . ,X  ; Y, , . . . ,Y  be  independent  sets  of 
1 n 1 m 

i.i.d.  random  variables,  the  distributions  of  and  Y..  being  (possibly) 


different.  Let 


m it 

U = l l c(Y  -X  ) 
i=l  j=l  J 


where  c(x)  = 1 or  0 according  as  x > 0 or  x < 0.  Then  the  following 


hold: 


(a)  U is  an  unbiased  estimator  of  mn0,  where  0 = Pr(X^<Y^); 

(b)  if  m/n  -*  k,  a positive  constant,  as  u,n  + °°,  then  U/mn  is 
asymptotically  normally  distributed; 

(c)  Var(U)  = mn[  (m-l)4>2  + (n-l)Y^  + 9(1-9)]  > where 


<P2  = Pr(X1  < Yj,  X1  < Y2)  - 02 
Y2  = Pr(Xj  < Yj,  X2  < Yp  - 02  ; 

(d)  Var(U)  ^ mn9(l-0)max(m,n)  . 


Proof:  (a)  and  (b)  follow  directly  from  the  theory  of  U-statistics 

(see,  e.g.,  Fraser  [6]);  (c)  and  (d)  are  given  by  Van  Dantzig  [26]. 

□ 


73 


We  return  now  to  the  specific  problem  of  estimating  the  noncent- 
rality parameter  of  a noncentral  chi-squared  random  variable.  Suppose 


that  Xj,...,Xn  are  i.i.d.  random  variables  with  density  (4.1.1);  i.e., 


,,2 


is  distributed  as  (A).  Suppose  further  that  Yj,...,Ym  are  i.i.d. 


random  variables  with  density  (4.1.2);  i.e.,  Y.  is  distributed  as 


V 


In 


practice,  only  Xj,X2»...,Xn  may  be  available  as  observed  data.  In 


that  case,  ^.Y,, Ym  could  be  generated  numerically  or  taken  from  a 


table  of  random  deviates.  Since  a y random  variable  is  the  sum  of  p 


independent  squared  unit  normal  random  variables,  widely  available 
tables  of  random  normal  deviates  could  be  used  to  generate  the  Y.’s  if 
necessary.  Hence  the  terminology  randomized  estimator.  Alternatively, 
both  the  X^s  and  Y^’s  may  be  observed  data,  the  Y^’s  being  thought  of 
as  "control  estimates"  of  the  noncentrality  parameter  zero.  In  this 


case,  both  the  X.’s  and  Y.’s  could  be  assumed  to  have  been  transformed 
i l 


by  the  same  monotonic  transformation  without  affecting  the  remaining 
analysis.  If  the  Y^’s  are  generated  randomly,  m can  be  chosen  for  con- 
venience to  be  an  exact  multiple  of  n,  say  m = rn;  while  not  necessary 
for  the  remaining  analysis,  such  a choice  simplifies  several  of  the 
results  and  will  be  assumed  here.  Let 


t = (mn)  , 


(4.3.1) 


where  U is  as  defined  in  Theorem  4.3.1.  It  follows  immediately  from 
(a)  of  that  theorem  that  t is  an  unbiased  estimator  of  6 = Pr(X^<Yj)  and 


1 


from  (d)  that  it  is  consistent  for  0 as  n -*•  <*>.  In  order  to  derive  an 
estimator  for  the  unknown  parameter  A,  it  will  be  sufficient  to  express 
A as  a known  function  of  0.  We  now  proceed  to  determine  that  function. 


Using  (4.1.1)  and  (4.1.2),  we  can  express  0 as 

oo  y 

0 = Pr(X1<YI)  = J j fpjA(x)fp(y)dx  dy 

y=0  x=0 

oo  y 
f f 00 

= J j I (^)J(j!)  1exp(-^X)fp+2^.  (x)fp(y)dx  dy  . 
y=0  x=0  ;i=0 

Since  all  terms  are  positive  and  0 < 0 < we  may  interchange  the  sum- 
mation and  integration,  obtaining 

oo  y 

°o  f l 

0 = l (hUJ(jl)~1exp(-h\)  j j fp+2.(x)fp(y)dx  dy  . 

j=0  y=0  x=0 

But  each  integral  in  the  above  equation  is  a function  of  p and  j only, 
ap  j say,  and  in  fact  can  be  expressed  as  an  incomplete  beta  ratio: 

oo  y 

ap.j  = 1 1 fP+2j(x)fp(y)dx  dy  = 

y=0  x=0 


where  I (p,q)  is  the  incomplete  beta  ratio.  Letting 


= [ a • C*sA) ( j !)_1exp(-3sA)  , 

P j=0  P>J 


(4.3.2) 


we  can  now  express  0 as  an  explicit,  known  function  of  A:  0 = gp(A). 
However,  we  require  A as  a function  of  0.  We  now  proceed  to  investigate 
the  properties  of  gp(A)  and  show  that  a single-valued  inverse  function 
exists. 

To  show  that  such  an  inverse  function  exists,  it  will  be  sufficient 
to  show  that  the  first  derivative  of  gp(A)  is  negative  for  all  A.  From 
(4.3.2)  we  have 


75 


SA«  m 31  I6''**  l .OT'CiXp} 


j»0 


P.J 


11 

= he'"2*  l (JsX)J(j!)"1(a  . - a .) 

j=0  P'J+1  P»J 


(4.3.3) 


Since  . = I^Osp+j,  hp) , it  follows  that 
P*  J ^ 


r(p+j) 


r(yp+j  > o. 


(4.3.4) 


p.j  " p.j+i  " rosp+j+ijrpsp) 

and  therefore  g^(X)  < 0 for  all  X.  Thus  for  p £ 2,  g^(X)  is 
raonotonically  decreasing  and  therefore  g *(0)  exists. 


For  even  values  of  p,  the  function  gp(X)  has  a closed  form,  which 
can  be  derived  using,  once  again,  the  recurrence  properties  of  the 
incomplete  beta  ratio.  In  particular,  we  have 

g9(X)  = h exp(-JjX) 


g4(*) 

g6(X) 


h exp  (--jX)  { 1 + jt-} 


16 

exp(--sX)  {l  + |j  + 5^} 


In  general,  if  q = '^p-l  is  a non-negative  integer,  then  gp(X)  = 

g2(X)Pq(X),  where  P^(X)  is  a polynomial  of  degree  q in  X such  that 

Pq(0)  = 1.  While  gp1(*)  has  a closed  form  only  for  p = 2,  it  is  a 

simple  matter  to  construct  tables  of  g *(•)  for  any  value  of  p.  Such 

tables  of  g^1(«)  are  included  in  the  appendix  for  several  values  of  p. 

Having  g 1 (0)  either  in  explicit  or  tabular  form,  we  are  now  in  a 
P 

position  to  define  our  randomized  estimator  of  the  unknown  noncentrality 
parameter  X. 


76 


Theorem  4.3.2.  Let  t be  as  defined  in  (4.3.1)  and  g (X)  as  de- 

/v  _ i 

fined  in  (4.3.2).  Then  X = (t)  is  an  estimator  of  X with  the 

following  properties: 

A 

(a)  X is  consistent  for  X as  n -*■ 

A 

(b)  X is  asymptotically  unbiased; 

A 

(c)  ^ X is  asymptotically  normally  distributed  with  variance 

Vp(A)  - [j|  g"1(0)]2[cj>2  + r" 1 y2]  , 

2 2 

where  $ and  y are  as  defined  in  Theorem  4.3.1  and  r = m/n. 

Proof:  All  three  statements  will  follow  if  we  can  show  that  /n(X-X) 
has  an  asymptotic  normal  distribution  with  mean  zero  and  variance  as 
given  i,i  (c) . Since  we  assume  r = m/n  is  a constant,  it  follows  from 
(a)  and  (b)  of  Theorem  4.3.1  that  v'ntt-Q)  is  asymptotically  normally 
distributed  with  zero  mean.  From  (c)  of  that  theorem,  we  have 


Var(/nt)  = Var[(myn)  ^U]  = (r  ^n  ^)Var(U) 


= + r 1 y^  + (rn)  ^[4>^+y^+  9(1-0)]  . 

2 “12 

Hence  lim[Var ( ✓nt ) ] = <J>  + r y , and  it  follows  that  this  is  the  asymp- 

n-K» 

totic  variance  of  v^n(t-0) . Since  gp(A)  ex  sts  and  is  negative  for  all 
X,  it  follows  that  g *(0)  is  also  differentiable.  Hence,  using  6a. 2.1 
of  Rao  [21],  we  have  that 


v'nlgpV)  - gp1^)]  = ^(X-X) 


is  asymptotically  normally  distributed  with  zero  mean  and  variance 


VX)  ' IdSSp'wlV  * r'V]  • 


77 


Thus  statements  (b)  and  (c)  are  proved,  and  because  V^(X)  does  not 


depend  on  n,  statement  (a)  also  follows. 


□ 


There  are  several  points  which  should  be  mentioned  concerning  the 

~ -1 
estimator  X.  First,  it  can  easily  be  shown  that  (•)  is  a convex 

function,  and  thus  by  Jensen's  inequality,  it  follows  that  X is  not  an 

A 

unbiased  estimator  for  small  samples,  but  rather  EX  > X.  In  addition  to 
being  biased  for  small  samples,  the  estimator  also  has  a rather  formi- 
dable exact  distribution.  However,  the  maximum  likelihood  esti- 
mator  suffers  from  these  same  problems  and  X is  much  easier 

A # 

to  compute.  X also  can  be  used  with  transformed  data,  and  still 
will  yield  the  valid  estimate  of  X.  Second,  there  is  a possibility  of 

A 

obtaining  negative  estimates  of  X with  X,  while  X cannot  be  negative. 
One  solution  to  this  difficulty  would  be  to  replace  any  negative  est- 

A /V 

imate  of  X with  X = 0.  In  any  case,  X < 0 only  if  t > h and  Pr(t>V)-H) 
if  X > 0,  due  to  the  consistency  of  t.  Thus  this  difficulty  will  be 
negligible  for  large  n. 

We  would  now  like  to  examine  the  efficiency  of  X with  respect  to  X*. 

A 

Certainly  X will  be  no  more  efficient  than  X*,  since  X*  has  asymptoti- 
cally minimum  variance.  However,  due  to  the  other  desirable  properties 

A 

of  X,  a statistically  less  efficient  estimator  could  be  acceptable.  In 
the  following  section,  we  will  investigate  the  asymptotic  relative 
efficiency  of  X and  X*. 


4.4  Asymptotic  relative  efficiency  of  X and  X*. 

Since  /nX  and  /nX*  both  have  limiting  normal  distributions  with  zero 
mean  and  constant  variance,  we  can  use  the  ratio 


A 


78 


v;(A) 

ep(A)  = \mTT 

p 


Var(»^nA*) 

Var(^nA) 


as  a measure  of  the  efficiency  of  A relative  to  A*.  Expressions  for 
the  two  variances  involved  in  e^fA)  have  been  given  in  the  previous 
two  sections.  Unfortunately,  neither  V*(A)  nor  Vp(A)  has  a simple 
closed  form.  Consequently  it  is  necessary  to  use  numerical  techniques 
to  evaluate  e^CA).  This  section  contains  methods  for  the  numerical 
evaluation  of  V*(A)  and  Vp(A),and  tables  of  these  quantities  for  select- 
ed values  of  p and  A are  included  in  the  Appendix.  We  restrict  our 
attention  here  to  A < 5. 

From  Theorem  4.2.1,  we  have 


where 


V£(A) 


j;ca) 


{-  j * j exp(-JjA)  J*  (A)  }_1  , 


u-j  ^q+1  q+1 


© 


i^C/XTT) 


-hu, 

e du 


(4.4.1) 


(4.4.2) 


Iq(x)  can  be  approximated  accurately  by  a polynomial  (see,  e.g., 
Abramowitz  and  Stegun  [1],  p.  378).  Also  I^(x)  = [sinh(x)]/x.  Using 
one  of  these  expressions  and  a continued  fraction  based  on  the  recurrence 
relation 


Iq+l(X')  = Vl^  ' 2(q/x)I  (x) 


we  can  calculate  the  value  of  I^(x)  for  any  value  of  q = ^p-l  when  p 
is  an  integer  greater  than  or  equal  to  2.  Thus  we  can  calculate  the 
integrand  in  (4.4.2)  for  any  values  of  q,  A and  u we  require.  The 
value  of  J* (A)  can  then  be  approximated  using  Gaussian  quadrature  with 
Laguerre  polynomials  (see,  e.g.,  Abramowitz  and  Stegun  [1],  p.  890). 
Tables  A. 2(a)  - (g)  give  values  of  V*(A),  computed  as  described  above, 


* 


79 


using  10-point  quadrature, for  p = 2(1)8  and  A = 0(.1)2(.5)5. 
By  Theorem  4.3.2,  we  have 


yx)  = gp1^]2^2  + r_1Y2]  • 


(4.4.3) 


Since  we  have  already  shown  that  g^(A)  < 0 for  all  A,  it  follows  that 
we  can  rewrite  (4.4.3)  as 


Vp(X) 


*2  + r'V 

[g^W]2 


(4.4.4) 


By  (4.3.3)  and  (4.3.4), 

n ■ m = p+1 

rrn 


In  this  form,  g^(A)  can  be  computed  easily.  Values  of  g^(A)  are  in- 

2 2 2 2 

eluded  in  tables  A.  2(a)  - (f) . <|>  + 0 and  y + 0 can  be  expressed  as 

infinite  series  of  the  form  T°  .q.^,  where  the  coefficients  q. 

satisfy  certain  recurrence  relations.  In  this  way,  4>^  and  can  be 

computed  numerically  also.  The  exact  forms  of  the  series  and  the 

2 

recurrence  properties  will  be  given  in  Theorem  4.4.1  for  <f>  and  in 
2 

Theorem  4.4.2  for  y . First,  however,  we  need  several  lemmas  which  are 
based  on  the  following  two  formulas  for  integration  by  parts: 


XlElii (hi 


[ xndx 

n 

n 

X 

(0+x)m 

m-1 

(6- 

f xndx 

m-n 

-2 

■ 

(8+x) 

(m- 

1)3  . 

formulas 

are  valid 

if 

m > 

Lemma  4.4.1.  For 

a,  b 

> 0 

(m-1) (8+x) 


(3+x)m_1  (m-l)8(g+x)m  1 


(4.4.5) 


(4.4.6) 


so 


1 fl  a+1  b,  . 
x y dx  dy 


a+1  f1  ( 1 xaybdx  dy 


j,  JQ  (!  + x+y)a+b+C+1  a+b+ci  J (1+x+y) a+b+c 


r(b+l)r(a+c+l)  r,,a+c+l  fh  , „ + c n 
r(a+b+c+l)  ^ I1/3  b+l ,a+c  • 


Proof: 


I1  f1  xa*V’dx  dy  |1ybJ(y)d>. 

J J a*x.y,a*>>— 1 J y JW  Y 


0 0 


(4.4.7) 


where 


1 


J(y)  = 


a+1  , 
x dx 


, a+b+c+1 


0 


(1+x+y) 

Using  (4.4.5)  with  n = a+1,  m = a+b+c+1,  8=  1+y,  we  obtain 

1 


J (y)  = - 


a+1 

X 

il 

a+1  f 

(a+b+c) (1+x+y) 3+b+C 

a+b+c  J 
x=0  0 

1 

a+1 

r1  xad 

a , 

x dx 


q (l+x+y) 


a+b+c 


(a+b+c) (2+y) 


a+b+c  a+b+c 


(1+x+y) 


a+b+c 


(4.4.8) 


Since 


0 


rl  ybdy  y,^a+c-l  r(b+l)r(a+c-l)  T , .... 

a+b+c  = (Js)  T (a+b+c) h/3(b+1  ’d+^  ’ 

(2+y) 


the  result  follows  immediately  on  substitution  of  (4.4.8)  in  (4.4.7).  □ 


Lemma  4.4.2.  For  a £ 0 and  c > max(a,l). 


00 


oo 


1 


a a , , 

x y dx  dy 

,,  ,a+c-l 
(1+x+y) 


c-a-2 

a+c 


xayadx  dy 
■ (l*x+y)a+c 


- ft) 


c-2 


r(a+l) T(c-l) 
f(a+c+l) 


I2/3Cc-1*a+1)  • 


81 


Proof : Using  (4.4.6)  with  n = a,  m = a+c+1,  0 = 1+y,  we  obtain 
the  following  upon  multiplying  both  sides  by  (l+y)y  and  integrating 
with  respect  to  y: 


’ 

f xayadx  dy 

* 

a a+1  , , 

x y dx  dy 

. J 
1 ] 

.a+c+1 

L (1+x+y)  j 

- 

L ] 

.a+c+1 
[ (1+x+y) 

c-1 

a+c 


a a , , 

x y dx  dy 


1 1 


(1+x+y) 


a+c  a+c 


+c  { (2+y) 


yady 


a+c 


(4.4.9) 


Substituting  y for  x in  (4.4.5),  then  letting  n = a+1,  m = a+c,  0 = 1+x, 
we  obtain  the  following  upon  multiplying  both  sides  by  x and  integrat- 
ing with  respect  to  x: 


II 
1 1 


a a+1  , , . 

x y dx  dy  _ a+1 

.a+c+1  a+c 
(1+x+y) 


t a a,  , 
x y dx  dy 

[ { (l+x+y)a+c 


a+c 


1 


(2+x) 


a j 
x dx 

a+c 


(4.4.10) 


Substituting  (4.4.10)  in  (4.4.9),  we  obtain 

CO  OO  OO  00 

(c-1  a+1 


f [ x y dx  dy 

J I ,,  .a+c+1 

\ \ (1+x+y) 


a+c  a+c 


II 

1 1 


r a a , , 

x y dx  dy 

(1+x+y) 


a+c 


Since 


2 

a+c 


a , 

JLjZ- 


(2+y) 


a+c 


1 


y = f^c-i  r(a+_i2r(cj_i)_  x 2 1} 

(2+y)a+c  ^ T(a+c)  A2/3tC  L’a+1)’ 


(4.4.11) 


the  result  follows  upon  substitution  in  (4.4.11)  and  simplification. 

n 


We  will  also  need  the  form  of  the  density  function  of  the  multi- 
variate inverted  beta  distribution.  It  is  given  in  Johnson  and  Kotz 
[12],  p.  238,  and  is  included  here  for  reference: 


82 


If  Xg,Xj,...,Xn  are  independent  random  variables  with  X^ 

2 

distributed  as  xv  (j  = 0,1,..., n) , then  Y.  = X./Xq 

Vj  3 3 

(j  = l,...,n)  have  a multivariate  inverted  beta  distribu- 
tion with  density  function 


n %v.-l 

n y,  j 

,»  (y  y ) . r<^> 

1 V , . . . ,Y  iyl*  ,ynJ  n n 


1’  ’ n 


n r(Jsv.)  (i *1  y.) 
j=o  J j=i  J 


(y.  ;>  0, j=l, . . . ,n)  (4.4.12) 


where 


vn  ’ 
v = > . nv . . 

^j=0  j 


2 2 

We  now  proceed  with  the  series  representations  of  <p  +0  and 

2 2 
Y + 6 . 

2 2 

Theorem  4.4.1.  Using  the  notation  of  Theorem  4.3.1,  0 +0  can 

be  expressed  as 


<J>2  + 02  = e ^ l q.  (3jX)  V j ! , 
3=0  J 


(4.4.13) 


where  = 1/3  and  for  j = 0,1,...,  the  following  recurrence  relation 
is  satisfied: 

Vi  * qj  ' roipirthpi'j.i)  CUp*j'1i2/j(pU.'s0.  («•«•!«) 

Further,  if  is  the  remainder  after  n+1  terms  in  (4.4.13),  then 


Rn|  < (P2X)n+1qn+1/(n+l)! 


(4.4.15) 


2 2 


Proof : We  have  <p  + 0 = Pr(X^<Y^,  X^<Y2),  where  Xj  is  distributed 

2 2 

as  (X)  and  yj»y2  are  distributed  as  Xp  and  all  three  are  independent. 
Since  the  density  of  X^  has  the  representation  (4.1.1),  it  is  clear  that 


83 


,2  . „2 


we  can  rewrite  0 +0  as  in  (4.4.13)  where  q.  = Pr(U  <Y  U <Y  1 

J j I*  j 2J  ’ 

Y1  and  Y2  being  as  before  and  being  distributed  independently  as 

xp+2j ' Thus  ^ = Pr(za)>l,  zJ2J>l),  where  ZD)  = Y,/U.  and  Z {2)  = 

(i)  m 3 J 1 J J 

Y2/Uj • But  Zj  and  Z.  have  a bivariate  inverted  beta  distribute 

Thus  from  (4.4.12),  with  vQ  = p+2j+2,  Vj  = v2  = p,  we  obtain 


q.  = . r((3/2)p+j+l)  f f 

3+1  rosp+j+i)  [r(*5p)]2  | | 


x^P-iy^-idx  dy 


(l+x+y)(3/2)P+i+1  ‘ 
Using  Lemma  4.4.2  with  a = *sp-l,  c = p+j+1,  we  obtain 


qi  + 


j + l 


= r((3/2)p+i>n 


OO  00 

L.lhp+j)  f [ .. 
\((3/2)p+j)  J J 


^p-1  Jip-1 


x *r  ~y'r  ‘dx  dy 

rcs>*j*i)[rcisp)]^  J J (i.x.y,  (V2)P.j 

. _rtMI(EllL  (yP-M,  . 

r((3/2)P+j+i)  2/3  I 


(4.4.14)  follows  immediately  on  simplification. 
When  j =0,  we  have 


% - Pr<uo<Yi'  W 

where  U0,Y1,Y2  are  i.i.d.  Clearly 

pr(U0<Yi,  U0<Y2)  = Pr(Yl<U0,  Yl<Y2)  = Pr(Y2<UQ,  Y2<Yl) 
Hence,  qQ  = 1/3.  Since  q^  decreases  as  j increases,  clearly 

fa)3 


M ■ 


X ^ 

j=n+l  J J ‘ 


< e^q 


n+1 


J fa)3 
L i ! 


But 

j=n+l  J ' 


j=n+l  -1  * 

is  the  remainder  after  n+1  terms  in  the  Taylor  series 


u\ 

expansion  for  e . Thus  for  A > 0, 


l 


I L i I 
'j=n+l 

and  (4.4.15)  follows  immediately 


< (W)n*1e-’sX/(n.l)l 


□ 


2 2 

Theorem  4.4.2 . Using  the  notation  of  Theorem  4.3.1,  y +0  can 


expressed  as 


2 n2  -An  n . . i 
Y + e = e 2.  . 


j=0  J 


(4.4.16) 


where 


n - ( P.i-k,k 

qj  ' kf0  Tj-k) !k!  * 

and  the  quantities  p.  , satisfy  the  following  recurrence  relations: 

J > K 


(i) 

p0,0  = 

(ii) 

Pj+l,k  " Pj,k 

(iii) 

Pj+l,j+l  ” Pj,j 

-rl£+iJ fl5)P+j  i 

>)rasp+j+i)l*J  lh 

r,(P+j) rMp+j 


X {Ti/3(!5P+j»P+^  + I1/3(JSP+j  + 1»P+j)}  » (4.4.18) 


(ii)  holding  for  j,k  = 0,1,...  and  (iii)  holding  for  j = 0,1 

Further,  if  U is  the  remainder  after  n+1  terms  in  (4.4.16),  then 

|Rj  < P*+1An+1/(n+l)J  , (4.4.19) 

where  p*  = max  p , . . 
n Isk<n  n'k’k 

Proof : y2  + 02  = Pr(Xj<Y^,  X2<Y^)  where  and  X2  are  distributed  as 
2 2 

(A)  and  Y^  is  distributed  as  Xp,ar|d  all  three  are  independent.  Since 
the  densities  of  both  Xj  and  X2  have  the  representation  (4.1.1),  we  can 


2 2 

rewrite  y + 0 as 


y + 02  = e"X  l l Pr(ua)<Y  , U^2)<Y  ) , 

j=0  k=0  J ,K-  J 

where  llj1-*,  U^and  Yj  are  independently  distributed  as  Xp+2 j > Xp+2k 
and  respectively.  Letting  p_.  ^ = Pr(llj^<Yj,  U^<Yj)  and  rearrang- 
ing terms  in  the  sum,  we  immediately  obtain  (4.4.16).  Since 
and  Y1  are  i.i.d.,  pQ  Q = 1/3  as  was  qQ  in  Theorem  4.4.1.  We  also  have 
Pj  k = Prtzj^ci,  Z^2)<1)  where  zj1J-  uj^/Yj,  zj2)  = U^/Y^and  thus 

Z^and  Z^2^  have  a bivariate  inverted  beta  distribution.  Thus  from 
3 J 

(4.4.12)  with  vQ  = p,  vx  = p+2j+2,  u2  = p+2k,  we  obtain 

n r((3/2)p+j+k+l)  fY  x^y^^dx  dy 


rj+i,k  ' r(%p)r(bp+j+i)r(!sp+k)  J J (1+x+y) (3/2)P+j+k+i  * 
Using  Lemma  4.4.1  with  a = kp+j-l,  b = Jsp+k=l,  c = ^p+2,  we  obtain 


p - r((3/2)p+j+k+l)  f Pff+j)  f 1 r1  d> 

j+l,k  ~ (%p)r(!ip+j  + l)r(4p+k)  \((3/2)p+j+k)  J (1+x+y)  (3/2)p+j+k+l 

_ r(%p-nk)r(p->-j)  .-j  •\P~*’ j t fwk  D+nl 

r((3/2)p+j+k+Bt-^  k,p  JJj  ‘ 

(4.4.17)  follows  immediately  on  simplification.  (4.4.18)  follows  simi- 
larly using  Lemma  4.4.1  twice,  first  as  it  stands  with  a = ^p+j-l, 
b = J^p+ j , c = '-ip+2,  and  then  with  x and  y interchanged  and  a = Hp+j-l. 
b = %p+j-l,  c = Jip+2. 

For  any  k,  p.  , decreases  as  j increases.  Thus  for  j > n+1  and 
j • k 

k ^ j > P j k k - P*  j .with  strict  equality  at  least  when  j > n+1.  Thus 

lRJ  = e'X  l 1 (j-k)TkT  Pj~k-k|  < 
n j=n+l  k=0U  K)  K< 


(1+x+y) 


But 


p;J  l tw)J  i (jTkyriri 

n 1 1 j=n+l  k=Ou  KJ  K' 


{ I = JL  j = II 

k=o(j-k)!k!  j!k=ok  j! 


,D  , -A  * V 

lRnl  < 6 K*l  Z jT  * 

j=n+l  J 

Since  l”_n+j  (X^ ) / j ! is  the  remainder  after  n+1  terms  in  the  Taylor 
series  expansion  for  e\  it  follows  that  for  A > 0, 

7 >J  i - x""1  -x 
i-LiJ'i  ("*‘)!  e ■ 

and  (4.4.19)  follows  upon  substitution.  □ 


Using  the  recurrence  properties  outlined  in  the  above  theorems, 

computation  of  <j>^  and  ^ is  straightforward.  These  quantities  are  also 

included  in  Tables  A. 2 (a)  - (g) . 

2 2 

The  quantities  <f>  , y , and  gp(^)  are  sufficient  for  calculating 

V (A)  and  hence  e (A)  for  any  value  of  r, where  r = lim(m/n) . Tables 
p P n-K» 

A. 3(a)  - (g)  give  e^fA)  for  A = 0(.1)2(.5)5  and  r = 1,2, 4, 8. 

Several  features  of  the  tables  deserve  specific  mention.  We  notice, 
for  example,  that  for  fixed  p and  A,  there  is  a rapid  increase  in  effi- 
ciency as  r increases  from  1 to  4.  As  r increases  further,  there  is 
additional  increase  in  efficiency,  but  it  is  more  gradual.  We  also 
note  that  for  fixed  p and  r,  ep(A)  gradually  rises  to  a maximum  as  A 
increases;  then  ep(A)  decreases,  slowly  at  first,  more  rapidly  for 
larger  A.  However,  in  each  case,  the  efficiency  is  relatively  stable 
for  values  of  A up  to  2 or  3. 

As  mentioned  in  section  4.1,  the  need  for  estimating  the  noncentral- 


87 


L_ 


ity  parameter  often  arises  in  problems  concerning  distances.  If  we 
consider  the  distances  between  points  in  the  unit  sphere,  then  the 
values  of  A which  are  of  interest  are  in  the  range  0 to  2,  where  the 

*\ 

efficiency  is  relatively  constant. 

A third  feature  of  interest  in  the  tables  is  the  value  of  A for 
which  maximum  efficiency  is  attained.  For  values  of  p considered  here, 
that  value  is  about  2 when  r = 1.  As  r increases,  the  value  where  max- 
imum efficiency  is  attained  decreases,  rapidly  at  first,  then  more  slowly 
for  larger  r.  The  maximal  efficiency  for  each  combination  of  p and  r 
is  indicated  by  an  asterisk  in  Tables  A. 3(a)  - (g)  to  facilitate  these 
comparisons. 

A 

In  summary,  we  find  that  the  randomized  estimator  A is  less  effic- 
ient than  the  maximum  likelihood  estimator  A*  when  the  data  are  appro- 
priate for  the  use  of  the  A*.  However,  for  moderate  values  of  r,  the 

/\ 

loss  of  efficiency  is  not  too  severe,  and  A is  easier  to  compute  than 


A 


A*.  A also  can  be  utilized  when  the  data  have  been  subjected  to  a mono- 
tonic distortion  if  control  observations  subject  to  the  same  distortion 
are  available.  Thus  it  is  believed  that  the  general  procedure  and 
specific  estimator  which  results  should  be  of  interest  and  value. 


4. 5 Extensions  and  generalizations. 

The  most  obvious  generalization  of  the  results  given  here  is  the 
removal  of  the  assumption  that  the  sample  observations  have  come  from 
a noncentral  chi-squared  distribution.  In  fact,  the  general  method  used 
here  would  be  applicable  anytime  that  Pr(X<Y)  can  be  expressed  as  an 
invertible  function  of  an  unknown  parameter.  Perhaps  the  method  could 
be  applied  in  other  situations  where  standard  procedures  lead  to  in- 
tractable estimators. 


88 

Another  desirable  generalization  would  be  the  removal  of  the 
assumption  of  independence  among  the  observations.  If  this  could  be 
done,  the  results  of  this  chapter  might  be  applied  directly  to  problems 
arising  under  models  such  as  those  considered  in  Chapters  two  and 
three.  The  requirement  of  independent  observations  hampers  those 
applications  of  the  theory  developed  here.  Work  has  been  done  on 
limit  theory  under  various  dependence  assumptions,  and  that  may  be 
applicable  to  statistics  of  the  form  considered  here.  However,  the 
author  is  unaware  of  such  results  when  the  two  samples  cannot  even 
be  considered  independent  of  each  other.  Further  investigation  along 
these  lines  would  be  of  interest  and  is  being  considered. 


BIBLIOGRAPHY 


1.  Abramowitz,  M. , and  Stegun,  I. A.  (1970),  Handbook  of  Mathematical 

Functions  with  Formulas,  Graphs,  and  Mathematical  Tables,  National 
Bureau  of  Standards,  Applied  Mathematics  Series,  no.  55. 

2.  Anderson,  T.W.  (1958),  An  Introduction  to  Multivariate  Analysis , 

New  York,  John  Wiley  and  Sons,  Inc. 

3.  Cormack,  R.M.  (1971),  "A  review  of  classification,"  Journal  of  the 

Royal  Statistical  Society,  Series  A,  134,  321-353. 

4.  Das  Gupta,  S.  (1964),  "Non-parametric  classification  rules,"  Sankhya, 

Series  A,  26,  25-30. 

5.  David,  H.A.  (1963),  The  Method  of  Paired  Comparisons,  London,  Chas. 

Griffin  and  Co.,  Ltd. 

6.  Fraser,  D.A.S.  (1957),  Nonparametric  Methods  in  Statistics , New  York, 

John  Wiley  and  Sons,  Inc. 

7.  Gradshteyn,  I.S.,  and  Ryzhik,  I.M.  (1965),  Table  of  Integrals,  Series 

and  Products,  New  York,  Academic  Press. 

8.  Gurland,  J.  (1953),  "Distribution  of  quadratic  forms  and  ratios  of 

quadratic  forms,"  Annals  of  Mathematical  Statistics , 24_,  416-427. 

9.  Gurland,  J.  (1956),  "Quadratic  forms  in  normally  distributed  random 

variables,"  Sankhya,  1_7,  37-50. 

10.  Hartigan,  J.A.  (1975),  Clustering  Algorithms,  New  York,  John  Wiley 

and  Sons,  Inc. 

11.  Johnson,  N.L.,and  Kotz,  S.(1970),  Distributions  in  Statistics: 

Continuous  Univariate  Distributions  - 2,  Boston,  Houghton  Mifflin. 

12.  Johnson,  N.L.,and  Kotz,  S.(1972),  Distributions  in  Statistics:  Con- 

tinuous Multivariate  Distributions,  New  York,  John  Wiley  and  Sons. 

13.  Kendall,  M.G.,  and  Stuart,  A.  (1963),  The  Advanced  Theory  of  Statis- 

tics, Vol.  1,  London,  Charles  Griffin  and  Co.,  Ltd. 

14.  Kendall,  M.G.,  and  Stuart,  A.  (1967),  The  Advanced  Theory  of  Statis- 

tics, Vol.  2,  New  York,  Hafner  Publishing  Co. 

15.  Kendall,  M.G. , and  Stuart,  A.  (1967),  The  Advanced  Theory  of  Statis- 

tics, Vol.  3,  New  York,  Hafner  Publishing  Co. 


16.  Kruskal,  J.B.  (1964),  "Multidimensional  scaling  by  optimizing  good- 
ness of  fit  to  a nonmetric  hypothesis,"  Psychometrika , 2£,  1-27. 


90 


17.  Kruskal,  J.B.  (1964),  "Nonmetric  multidimensional  scaling:  A numer- 

ical method,"  Psychometrika , 29,  115-129. 

18.  Meyer,  P.L.  (1967),  "The  maximum  likelihood  estimate  of  the  non- 

centrality parameter  of  a noncentral  chi-squared  variate," 

Journal  of  the  American  Statistical  Association,  62,  1258-1264. 

19.  Pandey,  J.N.,  and  Rahman,  M.  (1971),  "The  maximum  likelihood  esti- 

mate of  the  noncentrality  parameter  of  a noncentral  F variate," 
SIAM  Journal  of  Mathematical  Analysis , 2_,  269-276. 

20.  Press,  S.J.  (1966),  "Linear  combinations  of  non-central  chi-square 

variates,"  Annals  of  Mathematical  Statistics , 37_,  480-487. 

21.  Rao,  C.R.  (1965),  Linear  Statistical  Inference  and  its  Applications , 

New  York,  John  Wiley  and  Sons,  Inc. 

22.  Robbins,  H.  (1948),  "The  distribution  of  a definite  quadratic  form," 

Annals  of  Mathematical  Statistics,  19,  266-270. 


23.  Shepard,  R.N.  (1962),  "The  analysis  of  proximities:  Multidimensional 
scaling  with  an  unknown  distance  function,  I,"  Psychometrika , 

27,  125-140. 


24.  Shepard,  R.N.  (1962),  "The  analysis  of  proximities:  Multidimensional 
scaling  with  an  unknown  distance  function,  II,"  Psychometrika , 

27,  219-246. 


25.  Shepard,  R.N.  (1972),  "A  taxonomy  of  some  principal  types  of  data 

and  of  multidimensional  methods  for  their  analysis,"  Multidimen- 
sional Scaling:  Theory  and  Applications  in  the  Behavioral  Sciences , 
R.N.  Shepard,  A.K.  Romney,  and  S.B.  Nerlove,  eds.,  Vol . 1,  21-47, 
New  York,  Seminar  Press. 

26.  Van  Dantzig,  D.  (1951),  "On  the  consistency  and  the  power  of  Wilcox- 

on’s  two  sample  test,"  Nederlandse  Akademie  van  Wetenschappen , 
Proceedings,  Series  A,  5£,  1-8. 


I 


I 


APPENDIX 


TABLE  A . 1 : 

P 


e 

1 ■ 2 

3 

4 

.50 

0.0 

0.0 

0.0 

.49 

.081 

.095+ 

.108 

.48 

.163 

.192 

.217 

.47 

.248 

.291 

.329 

.46 

. .334 

.392 

.443 

.45 

.421 

.495+ 

.559 

.44 

.511 

.600 

.677 

.43 

.603 

.708 

.798 

.42 

.697 

.818 

.921 

.41 

.794 

.930 

1.047 

.40 

.893 

1.045+ 

1.176 

.39 

.994 

1.163 

1.308 

.38 

1.098 

1.284 

1.443 

.37 

1.204 

1.408 

1.581 

.36 

1.314 

1.535- 

1.723 

.35 

1.427 

1.665- 

1.869 

.34 

1.543 

1.799 

2.018 

.33 

1.662 

1.937 

2.171 

.32 

1.785+ 

2.079 

2.329 

.31 

1.912 

2.225+ 

2.491 

.30 

2.043 

2.376 

2.658 

.29 

2.179 

2.532 

2.830 

.28 

2.319 

2.693 

3.008 

.27 

2.465- 

2.859 

3.192 

.26 

2.616 

3.032 

3.383 

.25 

2.773 

3.211 

3.580 

.24 

2.936 

3.397 

3.785+ 

.23 

3.106 

3.591 

3.998 

.22 

3.284 

3.793 

4.220 

.21 

3.470 

4.004 

4.452 

*p(6) 


5 

6 

7 

8 

0.0 

0.0 

0.0 

0.0 

.119 

.129 

. 139 

.147 

.240 

.260 

.279 

.297 

.363 

.394 

.422 

.449 

.488 

.529 

.568 

.603 

.616 

.668 

.715+ 

.760 

.746 

.808 

.866 

.920 

.878 

.952 

1.019 

1.082 

1.014 

1.098 

1.175+ 

] .248 

1.152 

1.247 

1.335- 

1.416 

1.293 

1.399 

1.497 

1.588 

1.437 

1.554 

1.663 

1 .763 

1.585- 

1.713 

1.832 

1.942 

1.736 

1.876 

2.005- 

2.125- 

1.890 

2.042 

2.181 

2.312 

2.049 

2.212 

2.362 

2.502 

2.211 

2.386 

2.548 

2.698 

2.378 

2.565+ 

2.738 

2.898 

2.549 

2.749 

2.933 

3.104 

2.726 

2.938 

3.133 

3.314 

2.907 

3.132 

3.339 

3.531 

3.094 

3.332 

3.550+ 

3.754 

3.287 

3.538 

3.768 

3.983 

3.486 

3.750+ 

3.993 

4.219 

3.692 

3.971 

4.226 

4.463 

3.905- 

4.197 

4.466 

4.715+ 

4.126 

4.433 

4.715- 

4.976 

4.356 

4.678 

4.973 

5.247 

4.595- 

4.932 

5.241 

5.528 

4.844 

5.197 

5.520 

5.821 

F/B  12/1 


AO-AOS1  753  NORTH  CAROLINA  UNIV  AT  CHAPEL  HILL  DEPT  OP  STATISTICS 
SOME  STATISTICAL  PROCEDURES  BASED  ON  DISTANCES. <U> 

NOV  76  J J WALKER  DAA629-74-C-0030 

UNCLASSIFIED  MMS-1096  ARO-11959. 19-M  NL 


T 


93 


TABLE  A. 2(a):  Factors  for  Efficiency  Computations  (p  = 2) 


X 

gp(A) 

SpW 

4> 

y 

v;(X) 

0.0 

.500 

-.1250 

.0833 

.0833 

4.000 

0.1 

.488 

-.1219 

.0846 

.0819 

4.392 

0.2 

.476 

-.1189 

.0856 

.0806 

4.771 

0.3 

.464 

-.1160 

.0864 

.0792 

5.143 

0.4 

.452 

-.1131 

.0870 

.0778 

5.510 

0.5 

.441 

-.1103 

.0875 

.0764 

5.874 

0.6 

.430 

-.1076 

.0877 

.0750 

6.235 

0.7 

.420 

-.1049 

.0878 

.0736 

6.595 

0.8 

.409 

-.1023 

.0877 

.0723 

6.954 

0.9 

.399 

-.0998 

.0875 

.0709 

7.313 

1.0 

.389 

-.0974 

.0872 

.0695 

7.671 

1.1 

.380 

-.0950 

.0868 

.0682 

8.030 

1.2 

.370 

-.0926 

.0862 

.0668 

8 . 389 

1.3 

.361 

-.0903 

.0856 

.0655 

8.748 

1.4 

.352 

-.0881 

.0848 

.0641 

9.108 

1.5 

.344 

-.0859 

.0841 

.0628 

9.469 

1.6 

.335 

-.0838 

.0832 

.0615 

9.831 

1.7 

.327 

-.0817 

.0823 

.0602 

10.193 

1.8 

.319 

-.0797 

.0813 

.0589 

10.557 

1.9 

.311 

-.0777 

.0802 

.0576 

10.921 

2.0 

.303 

-.0758 

.0792 

.0564 

11.286 

2.5 

.268 

-.0669 

.0732 

.0503 

13. 125 

3.0 

.236 

-.0590 

.0668 

.0445 

14.986 

3.5 

.208 

-.0521 

.0603 

.0393 

16.868 

4.0 

.184 

-.0460 

.0539 

.0344 

18.766 

4.5 

.162 

-.0406 

.0479 

.0300 

20.680 

5.0 

. 143 

-.0358 

.0422 

.0261 

22.606 

94 


TABLE  A. 2(b):  Factors  for  Efficiency  Computations  (p  = 3) 


X 

ip(A) 

g^(A) 

<P2 

Y2 

VJ(A) 

0.0 

.500 

-.1061 

.0833 

.0833 

6.000 

0. 1 

.489 

-.1040 

.0842 

.0824 

6.395 

0.2 

.479 

-.1019 

.0849 

.0814 

6.783 

0.3 

.469 

-.0999 

.0855 

.0804 

7.165 

0.4 

.459 

-.0979 

.0860 

.0793 

7.544 

0.5 

.450 

-.0960 

.0863 

.0783 

7.920 

0.6 

.440 

-.0940 

.0865 

.0772 

8.293 

0.7 

.431 

-.0921 

.0865 

.0762 

8.666 

0.8 

.422 

-.0903 

.0865 

.0751 

9.036 

0.9 

.413 

-.0885 

.0864 

.0740 

9.407 

1.0 

.404 

-.0870 

.0862 

.0729 

9.776 

1.1 

.395 

-.0849 

.0859 

.0718 

10.146 

1.2 

.387 

-.0832 

.0855 

.0707 

10.515 

1.3 

.379 

-.0815 

.0850 

.0696 

10.884 

1.4 

.371 

-.0799 

.0845 

.0685 

11.253 

1.5 

.363 

-.0783 

.0839 

.0673 

11.623 

1.6 

.355 

-.0767 

.0832 

.0662 

11.993 

1.7 

.347 

-.0751 

.0825 

.0651 

12.363 

1.8 

.340 

-.0736 

.0818 

.0640 

12.734 

1.9 

.333 

-.0721 

.0810 

.0628 

13.014 

2.0 

.326 

-.0706 

.0801 

.0617 

13.476 

2.5 

.292 

-.0636 

.0754 

.0562 

15.341 

3.0 

.262 

-.0573 

.0701 

.0508 

17.219 

3.5 

.235 

-.0515 

.0645 

.0457 

19.111 

4.0 

.210 

-.0463 

.0589 

.0409 

21.015 

4.5 

. 188 

-.0416 

.0533 

.0363 

22.930 

5.0 

. 168 

-.0374 

.0480 

.0321 

24.855 

A 


95 


TABLE  A. 2(c):  Factors  for  Efficiency  Computations  (p  = 4) 


A 

gp(A) 

<t> 

Y* 

V*  (A) 

P 

0.0 

.500 

-.0938 

.0833 

.0833 

8.000 

0. 1 

.491 

-.0922 

.0840 

.0826 

8.397 

0.2 

.482 

-.0907 

.0846 

.0818 

8.789 

0.3 

.473 

-.0892 

.0850 

.0810 

9.177 

0.4 

.464 

-.0877 

.0854 

.0802 

9.562 

0.5 

.455 

-.0862 

.0856 

.0794 

9.945 

0.6 

.446 

-.0847 

.0858 

.0785 

10.326 

0.7 

.438 

-.0833 

.0859 

.0776 

10.705 

0.8 

.430 

-.0819 

.0858 

.0767 

11.083 

0.9 

.422 

-.0805 

.0858 

.0758 

11.461 

1.0 

.414 

-.0791 

.0856 

.0749 

11.838 

1.1 

.406 

-.0777 

.0854 

.0740 

12.214 

1.2 

.398 

-.0764 

.0851 

.0730 

12.590 

1.3 

.391 

-.0751 

.0847 

.0721 

12.965 

1.4 

.383 

-.0738 

.0843 

.0711 

13.341 

1.5 

.376 

-.0725 

.0838 

.0701 

13.716 

1.6 

.369 

-.0712 

.0833 

.0691 

14.092 

1.7 

.362 

-.0700 

.0827 

.0681 

14.468 

1.8 

.355 

-.0687 

.0821 

.0672 

14.843 

1.9 

.348 

-.0675 

.0815 

.0662 

15.219 

2.0 

.341 

-.0663 

.0808 

.0652 

15.596 

2.5 

.309 

-.0606 

.0768 

.0601 

17.481 

3.0 

.280 

-.0554 

.0723 

.0551 

19.374 

3.5 

.254 

-.0505 

.0674 

.0503 

21.276 

4.0 

.230 

-.0460 

.0624 

.0455 

23.187 

4.5 

.208 

-.0418 

.0573 

.0411 

25.106 

5.0 

.188 

-.0381 

.0523 

.0368 

27.034 

96 


TABLE  A. 2(d):  Factors  for  Efficiency  Computations  (p  = 5) 


A 

gp(A) 

g'CA) 

♦ 

Y 

v;(A) 

0.0 

.500 

-.0849 

.0833 

.0833 

10.000 

0.1 

.492 

-.0837 

.0839 

.0827 

10.398 

0.2 

.483 

-.0825 

.0843 

.0821 

10.792 

0.3 

.475 

-.0813 

.0847 

.0814 

11.184 

0.4 

.467 

-.0801 

.0850 

.0808 

11.573 

0.5 

.459 

-.0789 

.0852 

.0800 

11.960 

0.6 

.451 

-.0778 

.0854 

.0793 

12.345 

0.7 

.443 

-.0766 

.0854 

.0786 

12.729 

0.8 

.436 

-.0755 

.0854 

.0778 

13.112 

0.9 

.428 

-.0744 

.0854 

.0770 

13.495 

1.0 

.421 

-.0733 

.0852 

.0762 

13.876 

1.1 

.414 

-.0722 

.0851 

.0754 

14.257 

1.2 

.407 

-.0711 

.0848 

.0746 

14.638 

1.3 

.400 

-.0700 

.0845 

.0738 

15.018 

1.4 

.393 

-.0690 

.0842 

.0729 

15.398 

1.5 

.386 

-.0679 

.0838 

.0720 

15.778 

1.6 

.379 

- . 0669 

.0834 

.0712 

16.158 

1.7 

.372 

-.0658 

.0829 

.0703 

16.537 

1.8 

.366 

- . 0o48 

.0824 

.0694 

16.917 

1.9 

.359 

-.0638 

.0818 

.0685 

17.297 

2.0 

.353 

-.0628 

.0813 

.0676 

17.677 

2.5 

.323 

-.0580 

.0779 

.0630 

19.578 

3.0 

.295 

-.0535 

.0739 

.0583 

21.484 

3.5 

.269 

-.0493 

.0695 

.0537 

23.396 

4.0 

.246 

-.0453 

.0650 

.0492 

25.314 

4.5 

.224 

-.0416 

.0603 

.0448 

27.238 

5.0 

.204 

-.0382 

.0556 

.0406 

29 . 169 

97 


TABLE  A. 2(e):  Factors  for  Efficiency  Computations  (p  = 6) 


A 

VX) 

g’(X) 

Y 

V*  (A) 

P 

0.0 

.500 

-.0781 

.0833 

.0833 

12.000 

0.1 

.492 

-.0771 

.0838 

.0828 

12.399 

0.2 

.485 

-.0762 

.0842 

.0823 

12.794 

0.3 

.477 

-.0752 

.0845 

.0817 

13.188 

0.4 

.470 

-.0743 

.0847 

.0811 

13.579 

0.5 

.462 

-.0733 

.0849 

.0805 

13.969 

0.6 

.455 

-.0724 

.0851 

.0799 

14.358 

0.7 

.448 

-.0714 

.0851 

.0793 

14.745 

0.8 

.441 

-.0705 

.0851 

.0786 

15.132 

0.9 

.434 

-.0696 

.0851 

.0779 

15.518 

1.0 

.427 

-.0686 

.0850 

.0772 

15.902 

1.1 

.420 

-.0677 

.0848 

.0765 

16.287 

1.2 

.413 

-.0668 

.0846 

.0757 

16.671 

1.3 

.406 

-.0659 

.0844 

.0750 

17.055 

1.4 

.400 

-.0650 

.0841 

.0742 

17.438 

1.5 

.393 

-.0641 

.0838 

.0734 

17.821 

1.6 

.387 

-.0633 

.0834 

.0727 

1 8 . 204 

1.7 

.381 

-.0624 

.0830 

.0719 

18.587 

1.8 

.375 

-.0615 

.0826 

.0711 

18.970 

1.9 

.369 

-.0607 

.0821 

.0702 

19.352 

2.0 

.362 

-.0598 

.0816 

.0694 

19.735 

2.5 

.334 

-.0557 

.0786 

.0652 

21.649 

3.0 

.307 

-.0518 

.0751 

.0608 

23.565 

3.5 

.282 

-.0481 

.0712 

.0565 

25.486 

4.0 

.259 

-.0445 

.0670 

.0522 

27.411 

4.5 

.237 

-.0412 

.0627 

.0479 

29.341 

5.0 

.217 

-.0381 

.0583 

.0438 

31.275 

98 


TABLE  A. 2(f):  Factors  for  Efficiency  Computations  (p  = 7) 


A 

gp(A) 

SpW 

Y 

V-(X) 

0.0 

.500 

-.0728 

.0833 

.0833 

14.000 

0. 1 

.493 

-.0719 

.0837 

.0829 

14.399 

0.2 

.486 

-.0711 

.0841 

.0824 

14.796 

0.  3 

.479 

-.0703 

.0843 

.0819 

15.191 

0.4 

.472 

-.0695 

.0846 

.0814 

15.584 

0.5 

.465 

-.0687 

.0847 

.0809 

15.976 

0.6 

.458 

-.0679 

.0848 

.0803 

16.367 

0.7 

.451 

-.0672 

.0849 

.0798 

16.756 

0.8 

.444 

-.0664 

.0849 

.0792 

17.145 

0.9 

.438 

-.0656 

.0849 

.0785 

17.534 

1.0 

.431 

-.0648 

.0848 

.0779 

17.921 

1.1 

.425 

-.0640 

.0847 

.0773 

18.308 

1.2 

.418 

-.0633 

.0845 

.0766 

18.695 

1.3 

.412 

-.0625 

.0843 

.0759 

19.081 

1.4 

.406 

-.0617 

.0841 

.0752 

19.467 

1.5 

.400 

-.0610 

.0838 

.0745 

19.853 

1.6 

.394 

-.0602 

.0835 

.0738 

20.238 

1.7 

.388 

-.0595 

.0831 

.0731 

20.623 

1.8 

.382 

-.0587 

.0827 

.0724 

21.009 

1.9 

.376 

-.0580 

.0823 

.0716 

21.393 

2.0 

.370 

-.0572 

.0818 

.0708 

21.779 

2.5 

.343 

-.0536 

.0729 

.0669 

23.703 

3.0 

.317 

-.0502 

.0760 

.0629 

25.628 

3.5 

.292 

-.0469 

.0725 

.0587 

27.557 

4.0 

.270 

t . 0437 

.0686 

.0546 

29.487 

4.5 

.249 

-.0407 

.0646 

.0505 

31.422 

5.0 

.229 

-.0379 

.0604 

.0465 

33.361 

99 


TABLE  A. 2(g):  Factors  for  Efficiency  Computations  (p  = 8) 


A 

gp(A) 

g^CX) 

<T 

V* 

VJM 

0.0 

.500 

-.0684 

.0833 

.0833 

16.000 

0. 1 

.493 

-.0677 

.0837 

.0829 

16.400 

0.2 

.486 

-.0670 

.0840 

.0825 

16.797 

0.3 

.480 

-.0663 

.0842 

.0821 

17.193 

0.4 

.473 

-.0656 

.0844 

.0816 

17.587 

0.5 

.467 

-.0649 

.0846 

.0812 

17.981 

0.6 

.460 

-.0643 

.0847 

.0807 

18.373 

0.7 

.454 

-.0636 

.0847 

.0802 

18.765 

0.8 

.447 

-.0629 

.0847 

.0796 

19.155 

0.9 

.441 

-.0622 

.0847 

.0791 

19.546 

1.0 

.435 

-.0616 

.0846 

.0785 

19.935 

1.1 

.429 

-.0609 

.0845 

.0779 

20.324 

1.2 

.423 

-.0602 

.0844 

.0773 

20.713 

1.3 

.417 

-.0596 

.0842 

.0767 

20.101 

1.4 

.411 

-.0589 

.0840 

.0761 

21.489 

1.5 

.405 

-.0582 

.0838 

.0754 

21.877 

1.6 

.399 

-.0576 

.0835 

.0747 

22.264 

1.7 

.394 

-.0569 

.0832 

.0741 

22.651 

1.8 

.388 

-.0563 

.0828 

.0734 

23.038 

1.9 

.382 

-.0556 

.0824 

.0727 

23.425 

2.0 

.377 

-.0550 

.0820 

.0720 

23.812 

2.5 

.350 

-.0518 

.0797 

.0683 

25.745 

3.0 

.325 

-.0487 

.0768 

.0645 

27.678 

3.5 

.301 

-.0457 

.0735 

.0606 

29.613 

4.0 

.279 

-.0429 

.0699 

.0567 

31.550 

4.5 

.259 

-.0401 

.0661 

.0528 

33.489 

5.0 

.239 

-.0375 

.0623 

.0489 

35.432 

100 


TABLE  A. 3(a):  Efficiency  (p  = 2) 
[*  Denotes  maximum  in  each  column] 


A 

r = 1 

r - 2 

r = 4 

r = 8 

0.0 

.375 

.500 

.600 

.667 

0.1 

.392 

.520 

.621 

.688 

0.2 

.406 

.536 

.638 

.705 

0.3 

.418 

.549 

.651 

.718 

0.4 

.428 

.560 

.662 

.728 

0.5 

.436 

.569 

.671 

.737 

0.6 

.444 

.576 

.678 

.743 

0.7 

.450 

.583 

.684 

.749 

0.8 

.455 

.588 

.688 

.753 

0.9 

.460 

.592 

.692 

.756 

1.0 

.464 

.596 

.695 

.758 

1.1 

.467 

.599 

.697 

.760 

1.2 

.470 

.601 

.699 

.761 

1.3 

.472 

.603 

.700 

.761* 

1.4 

.474 

.604 

.700 

.761 

1.5 

.476 

.605 

.700* 

.760 

1.6 

.477 

.606 

.700 

.759 

1.7 

.478 

.606* 

.699 

.758 

1.8 

.478 

.606 

.698 

.756 

1.9 

.479 

.605 

.697 

.755 

2.0 

.479* 

.604 

.696 

.753 

2.5 

.476 

.597 

.685 

.739 

3.0 

.469 

.587 

.670 

.722 

3.5 

.460 

.573 

.653 

.703 

4.0 

.449 

.558 

.635 

.682 

4.5 

.437 

.542 

.615 

. 660 

5.0 

.425 

.525 

.595 

.637 

101 


TABLE  A. 3(b):  Efficiency  (p  - 3) 
[*  denotes  maximum  in  each  column] 


X 

r = 1 

r = 2 

r = 4 

r = 8 

0.0 

.405 

.540 

.648 

.721 

0. 1 

.415 

.552 

.660 

.732 

0.2 

.424 

.561 

.669 

.741 

0.3 

.431 

.569 

.677 

.74  8 

0.4 

.438 

.576 

.684 

.754 

0.5 

.443 

.581 

.689 

.759 

0.6 

.448 

.586 

.693 

.763 

0.7 

.452 

.590 

.697 

.766 

0.8 

.456 

.594 

.700 

.768 

0.9 

.459 

.597 

.702 

.770 

1.0 

.462 

.599 

.704 

.771 

1.1 

.464 

.601 

.705 

.772 

1.2 

.466 

.603 

.706 

.772* 

1.3 

.468 

.604 

.707 

.772 

1.4 

.469 

.605 

.707* 

.772 

1.5 

.471 

.605 

.707 

.771 

1.6 

.472 

.606 

.706 

.770 

1.7 

.472 

.606* 

.706 

.769 

1.8 

.473 

.606 

.705 

.768 

1.9 

.473 

.605 

.704 

.766 

2.0 

.473* 

.605 

.702 

.764 

2.5 

.472 

.600 

.694 

.753 

3.0 

.467 

.591 

.682 

.739 

3.5 

.461 

.581 

.668 

.723 

4.0 

.452 

.569 

.653 

.705 

4.5 

.443 

.556 

.637 

.687 

5.0 

.434 

.542 

.620 

.668 

103 


TABLE  A. 3(c):  Efficiency  (p  = 4) 
[*  denotes  maximum  in  each  column] 


X 

r = 1 

r = 2 

II 

r = 8 

0.0 

.422 

.562 

.675 

.750 

0.1 

.428 

.570 

.682 

.757 

0.2 

.434 

.576 

.688 

.762 

0.3 

.439 

.581 

.693 

.767 

0.4 

.444 

.586 

.697 

.770 

0.5 

.448 

.589 

.700 

.773 

0.6 

.451 

.593 

.703 

.775 

0.7 

.454 

.596 

.705 

.777 

0.8 

.457 

.598 

.707 

.778 

0.9 

.459 

.600 

.709 

.779 

1.0 

.461 

.602 

.710 

.780 

1.1 

.463 

.603 

.711 

.780* 

1.2 

.465 

.604 

.711 

.780 

1.3 

.466 

.605 

.711* 

.780 

1.4 

.467 

.606 

.711 

.779 

1.5 

.468 

.606 

.711 

.778 

1.6 

.469 

.606 

.711 

.777 

1.7 

.469 

.606* 

.710 

.776 

1.8 

.470 

.606 

.709 

.775 

1.9 

.470 

.606 

.708 

.773 

2.0 

.470* 

.605 

.707 

.772 

2.5 

.469 

.601 

.700 

.762 

3.0 

.466 

.594 

.690 

.750 

3.5 

.461 

.586 

.678 

.736 

4.0 

.454 

.576 

.665 

.720 

4.5 

.447 

.565 

.651 

.704 

5.0 

.439 

.553 

.636 

.688 

104 


1 


TABLE  A. 3(d):  Efficiency  (p  = 5) 
[*  denotes  maximum  in  each  column] 


A 

r = 1 

r = 2 

r = 4 

r = 8 

0.0 

.432 

.576 

.692 

.769 

0. 1 

.437 

.581 

.696 

.773 

0.2 

.441 

.586 

.700 

.776 

0.3 

.445 

.589 

.703 

.779 

0.4 

.448 

.592 

.706 

.781 

0.5 

.451 

.595 

.708 

.783 

0.6 

.454 

.598 

.710 

.784 

0.7 

.456 

.600 

.712 

.785 

0.8 

.458 

.601 

.713 

.786 

0.9 

.460 

.603 

.714 

.786 

1.0 

.462 

.604 

.715 

.786* 

1.1 

.463 

.605 

.715 

.786 

1.2 

.464 

.606 

.715 

.786 

1.3 

.465 

.607 

.715* 

.785 

1.4 

.466 

.607 

.715 

.785 

1.5 

.467 

.607 

.715 

.784 

1.6 

.467 

.607* 

.714 

.783 

1.7 

.468 

.607 

.713 

.782 

1.8 

.468 

.607 

.713 

.781 

1.9 

.468 

.607 

.712 

.779 

2.0 

.469* 

.606 

.711 

.778 

2.5 

.468 

.603 

.704 

.769 

3.0 

.465 

.597 

.695 

.758 

3.5 

.461 

.589 

.685 

.745 

4.0 

.456 

.581 

.673 

.731 

4.5 

.449 

.571 

.661 

.717 

5.0 

.443 

.561 

.648 

.702 

7- 


105 


TABLE  A. 3(e):  Efficiency  (p  = 6) 
[*  denotes  maximum  in  each  column] 


A 

r = 1 

r = 2 

r = 4 

r = 8 

0.0 

.439 

.586 

.703 

.781 

0. 1 

.443 

.589 

.706 

.784 

0.2 

.446 

.592 

.709 

.786 

0.3 

.449 

.595 

.711 

.788 

0.4 

.451 

.598 

.713 

.789 

0.5 

.454 

.600 

.714 

.790 

0.6 

.456 

.601 

.716 

.791 

0.7 

.458 

.603 

.717 

.791 

0.8 

.459 

.604 

.718 

.792 

0.9 

.461 

.605 

.718 

.792 

1.0 

.462 

.606 

.718 

.792 

1.1 

.463 

.607 

.719 

. 791 

1.2 

.464 

.608 

.719* 

.791 

1.3 

.465 

.608 

.718 

.790 

1.4 

.466 

.608 

.718 

.789 

1.5 

.466 

.608 

.718 

.789 

1.6 

.467 

.608* 

.717 

.788 

1.7 

.467 

.608 

.716 

.786 

1.8 

.467 

.608 

.716 

.785 

1.9 

.468 

.608 

.715 

.784 

2.0 

.468* 

.607 

.714 

.782 

2.5 

.467 

.604 

.708 

.774 

3.0 

.465 

.599 

.700 

.764 

3.5 

.461 

.592 

.690 

.753 

4.0 

.457 

.584 

.680 

.740 

4.5 

.451 

.576 

.668 

.727 

5.0 

.445 

.567 

.657 

.713 

107 


TABLE  A. 3(g):  Efficiency  (p  = 8) 
[*  denotes  maximum  in  each  column] 


A 

r = 1 

r = 2 

r = 4 

r = 8 

0.0 

.449 

.598 

.718 

.798 

0. 1 

.451 

.600 

.719 

.799 

0.2 

.453 

.602 

.721 

.799 

0.3 

.455 

.603 

.722 

.800 

0.4 

.456 

.605 

.723 

.800 

0.5 

.458 

.606 

.723 

.801 

0.6 

.459 

.607 

.724 

.801* 

0.7 

.460 

.608 

.724 

.801 

0.8 

.461 

.609 

.724 

.801 

0.9 

.462 

.609 

.725* 

.800 

1.0 

.463 

.610 

.725 

.800 

1.1 

.464 

.610 

.724 

.799 

1.2 

.465 

.610 

.724 

.799 

1.3 

.465 

.611 

.724 

.798 

1.4 

.466 

.611* 

.723 

.797 

1.5 

.466 

.611 

.723 

.796 

1.6 

.466 

.611 

.722 

.795 

1.7 

.467 

.610 

.722 

.794 

1.8 

.467 

.610 

.721 

.793 

1.9 

.467 

.610 

.720 

.791 

2.0 

.467* 

609 

.719 

.790 

2.5 

.466 

.606 

.713 

.783 

3.0 

.465 

.602 

.707 

.774 

3.5 

.462 

.597 

.699 

.764 

4.0 

.458 

.590 

.690 

.753 

4.5 

.454 

.583 

.680 

.742 

5.0 

.449 

.575 

.670 

.730 

UNCLASS  IF  JED 


SECURITY  CLASSIFICATION  OF  THIS  PAGE  flfh«n  Palm  Entarad, 


454.  n-m 


REPORT  DOCUMENTATION  PAGE 


2.  30VT  ACCESSION  NO.I  1 RECIPIENT  S CATALOG  NUMBER 


4.  TITLE  (and  Subtill*) 


m 


' Some  Statistical  Procedures  Based  on 
l ' j Distances  . 1 — — — 


7.  AUTHOR!.) 


_ ..  A,  l _ U&-  r.nuTBAr.T  or  gran-  number;.) 

J,  /Joseph  J./ Walker  - 74  -c-jsjm] 

C X— — 1 fvAisf’CAvj  "r  * 

■■  ~ l7?!  M ^.s*-  -2-<py  C,  I 

9.  PERFORMING  ORGANIZATION  NAME  AND  ADDRESS  10.  PROGRAMELEte«<f 

AREA  rTTOBK  UNIT  iWffBERS 

Department  ot  Statistics'/  > 7 1 , 

University  of  North  Carolina  5 /<P  J 7 7-  ■ f 

| Chapel  Hill,  North  Carolina  27514  — y*  ■"  ’ 

’ll.  CONTROLLING  OFFICE  KAME  AND  ADDRESS  /’ll-  ■ -■  | 

Army  Research  Office  ‘ Jl  Nov^EHMBE76  / 

Box  1221  mitiiiMM.iiii 

Research  Triangle  Park,  NC  27709 107 

Tl  MONITORING  AGENCY  NAME  « AOORESSfl/  dlllaran  t Iro  m Controlling  Olllct)  IS.  SECURITY  CLASS,  (o  I I hi,  r.porfj 

I UNCLASSIFIED 

is*,  declassification/downgrading 

SCHEDULE 

* TV.  OlST rFbuTION  STATEMENT  (of  thia  report) 

j • Approved  for  Public  Release:  Distribution  Unlimited 


io.  program  f1 

AREA  rwgiK  UNIT  LUMBERS 

(5?  J-Ll-  P-  / 


17.  DISTRIBUTION  STATEMENT  (of  the  abetted  entered  In  Block  20,  II  dlllerent  from  Report) 


SUPPLEMENTARY  NOTES  . . 

■ , w as  BU 

'The  mdinjts  in  this  < ..  V.',  ' ’'.'V':,.  uuless  so. 

officii  »>•  ’ I-.-- 

.designated 

KEY  WORDS  ( Continue  on  reverae  aid*  II  naceaaary  and  Identity  by  block  number) 

Distances,  Classification,  Measurement  errors,  Asymptotic  experiment, 
Noncentral  chi-square  distribution,  Clustering. 

\ 



ABSTRACT  (Ctytlnue  on  reverae  aide  It  neceaaary  ft.  Identity  by  block  number ) 

The  distributions  of  statistics  employed  in  classifying  the  source 
of  a new  observation,  using  observed  distances  which  we^sub j ecfcJto  measui 
ment  errors,  are  discussed.  A useful  approximate  expansion  is  obtained. 

A new  method  of  estimating  the  parameter  of  ? noncentral  chi-square 
distribution  is  derived. 


, JA™!  1473  EDITION  OF  1 NOV  SS  IS  OBSOLETE 


unclassified JU  , 

SECURITY  CLASSIFICATION  OF  THIS  PAGE  fWl.n  Dmto  Bnlaradl  v/ 1 


SECURITY  CLASSIFICATION  OF  THIS  P AGE(lFh*n  Dmtm  F.ntmrmd ) 


